Self-Driving Car Engineer Nanodegree

Deep Learning

Project: Build a Traffic Sign Recognition Classifier

In this notebook, a template is provided for you to implement your functionality in stages, which is required to successfully complete this project. If additional code is required that cannot be included in the notebook, be sure that the Python code is successfully imported and included in your submission if necessary.

Note: Once you have completed all of the code implementations, you need to finalize your work by exporting the iPython Notebook as an HTML document. Before exporting the notebook to html, all of the code cells need to have been run so that reviewers can see the final implementation and output. You can then export the notebook by using the menu above and navigating to \n", "File -> Download as -> HTML (.html). Include the finished document along with this notebook as your submission.

In addition to implementing code, there is a writeup to complete. The writeup should be completed in a separate file, which can be either a markdown file or a pdf document. There is a write up template that can be used to guide the writing process. Completing the code template and writeup template will cover all of the rubric points for this project.

The rubric contains "Stand Out Suggestions" for enhancing the project beyond the minimum requirements. The stand out suggestions are optional. If you decide to pursue the "stand out suggestions", you can include the code in this Ipython notebook and also discuss the results in the writeup file.

Note: Code and Markdown cells can be executed using the Shift + Enter keyboard shortcut. In addition, Markdown cells can be edited by typically double-clicking the cell to enter edit mode.


Step 0: Load The Data

In [415]:
# Visualizations will be shown in the notebook.
%matplotlib inline
from importlib import reload
import utils; reload(utils)
from utils import *
In [2]:
# Import crucial modules
import pickle
import tensorflow as tf
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
from sklearn.model_selection import train_test_split
import cv2
import matplotlib.pyplot as plt

from sklearn.utils import shuffle
import time
In [3]:
datasets_path = "../datasets/german_traffic_signs/"
models_path = "./models/"

training_file = datasets_path + 'train.p'
validation_file= datasets_path + 'valid.p'
testing_file = datasets_path + 'test.p'

with open(training_file, mode='rb') as f:
    train = pickle.load(f)
with open(validation_file, mode='rb') as f:
    valid = pickle.load(f)
with open(testing_file, mode='rb') as f:
    test = pickle.load(f)
    
X_train, y_train = train['features'], train['labels']
X_valid, y_valid = valid['features'], valid['labels']
X_test, y_test = test['features'], test['labels']

Step 1: Dataset Summary & Exploration

The pickled data is a dictionary with 4 key/value pairs:

  • 'features' is a 4D array containing raw pixel data of the traffic sign images, (num examples, width, height, channels).
  • 'labels' is a 1D array containing the label/class id of the traffic sign. The file signnames.csv contains id -> name mappings for each id.
  • 'sizes' is a list containing tuples, (width, height) representing the original width and height the image.
  • 'coords' is a list containing tuples, (x1, y1, x2, y2) representing coordinates of a bounding box around the sign in the image. THESE COORDINATES ASSUME THE ORIGINAL IMAGE. THE PICKLED DATA CONTAINS RESIZED VERSIONS (32 by 32) OF THESE IMAGES

Complete the basic data summary below. Use python, numpy and/or pandas methods to calculate the data summary rather than hard coding the results. For example, the pandas shape method might be useful for calculating some of the summary results.

Provide a Basic Summary of the Data Set Using Python, Numpy and/or Pandas

In [4]:
# Let's get some dimensions
print("Features shape: ", X_train.shape)
print("Leatures shape: ", y_train.shape)
Features shape:  (34799, 32, 32, 3)
Leatures shape:  (34799,)
In [226]:
# Number of training examples
n_train = X_train.shape[0]

# Number of validation examples
n_validation = X_valid.shape[0]

# Number of testing examples.
n_test = X_test.shape[0]

# What's the shape of an traffic sign image?
image_shape = X_train.shape[1:]

# How many unique classes/labels there are in the dataset.
n_classes = len(set(y_train))

print("Number of training examples =", n_train)
print("Number of validation examples =", n_validation)
print("Number of testing examples =", n_test)
print("Image data shape =", image_shape)
print("Number of classes =", n_classes)
Number of training examples = 34799
Number of validation examples = 4410
Number of testing examples = 12630
Image data shape = (32, 32, 3)
Number of classes = 43

Group the images by class

We leverage on the pandas module to group the images by label. Our first step is to import the signnames.csv file which contains English description of the labels.

In [6]:
# Load sign names file
sign_names = pd.read_csv("signnames.csv")
sign_names.set_index("ClassId")

sign_names.head(n=3)
Out[6]:
ClassId SignName
0 0 Speed limit (20km/h)
1 1 Speed limit (30km/h)
2 2 Speed limit (50km/h)

Group Images In Training set

In [7]:
X_train_id_to_label = group_img_id_to_lbl(y_train, sign_names)

# Let's print a few of those mappings now
X_train_id_to_label.head(n=3)
Out[7]:
img_id label_id label_name
0 0 41 End of no passing
1 1 41 End of no passing
2 2 41 End of no passing
In [8]:
# We should group by label id to understand the distribution
X_train_group_by_label_count = group_img_id_to_lb_count(X_train_id_to_label)
X_train_group_by_label_count.head(n=5)
Out[8]:
img_id
label_id label_name
0 Speed limit (20km/h) 180
1 Speed limit (30km/h) 1980
2 Speed limit (50km/h) 2010
3 Speed limit (60km/h) 1260
4 Speed limit (70km/h) 1770

We can already witness siginificant data imbalance across the classes in the training. For instance class 0 has less than 200 images, while class 2 has over 2000 images. Let's plot the number of images against the class to better understand visually where we stand.

Training Set Distribution Visualization

In [434]:
# Now count the number of images for each class and plot the results so that we understand the distribution
X_train_group_by_label_count.plot(kind='bar', figsize=(15, 7))
Out[434]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fd5303825c0>

This confirms the significant class imbalance we observed earlier. At this stage we will not attempt to address it.

Training Set Image Visualization

In [10]:
# Let's plot images from each of these classes
X_train_group_by_label = X_train_id_to_label.groupby(["label_id", "label_name"])
img_per_class = 5

show_random_dataset_images(X_train_group_by_label, X_train)    

Group Images In Validation Set

In [11]:
# Create a data frame whereby we index image id  with label
X_valid_id_to_label = group_img_id_to_lbl(y_valid, sign_names)

# Let's print a few of those mappings now
X_valid_id_to_label.head(n=3)
Out[11]:
img_id label_id label_name
0 0 41 End of no passing
1 1 41 End of no passing
2 2 41 End of no passing
In [12]:
# We should group by label id to understand the distribution
X_valid_group_by_label_count = group_img_id_to_lb_count(X_valid_id_to_label)
X_valid_group_by_label_count.head(n=3)
Out[12]:
img_id
label_id label_name
0 Speed limit (20km/h) 30
1 Speed limit (30km/h) 240
2 Speed limit (50km/h) 240

Validation Set Distribution Visualization

In [13]:
# Now count the number of images for each class and plot the results so that we understand the distribution
X_valid_group_by_label_count.plot(kind='bar', figsize=(15, 7))
Out[13]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fd73eee9b00>

We also observe significant imbalance in the validation set. This is honestly less of a problem as we do not train our data on it.

Validation Set Images Visualization

In [14]:
# Let's plot images from each of these classes
X_valid_group_by_label = X_valid_id_to_label.groupby(["label_id", "label_name"])

show_random_dataset_images(X_valid_group_by_label, X_valid)

The validation set seems to suffer from the same problems as the training set, where some images are extremely dark.

Create a sample set

As we have over 30 thousand images, we do not want to try every model on the whole dataset as this would be extremely time consuming. Instead, we create sample datasets for training and validation, where we copy a percentage P of both dataset.

In [15]:
X_sample_train, y_sample_train = create_sample_set(X_train_group_by_label, X_train, y_train, pct=0.33)
print("Sample training images dimensions={0}, labels dimensions={1}".format(X_sample_train.shape, y_sample_train.shape))

X_sample_valid, y_sample_valid = create_sample_set(X_valid_group_by_label, X_valid, y_valid, pct=0.33)
print("Sample validation images dimensions={0}, labels dimensions={1}".format(X_sample_valid.shape, y_sample_valid.shape))
Sample training images dimensions=(11464, 32, 32, 3), labels dimensions=(11464,)
Sample validation images dimensions=(1427, 32, 32, 3), labels dimensions=(1427,)

We can see that some images are very dark, and we should therefore look at applying a pre-processing step to brighten them if possible.


Step 2: Design and Test a Model Architecture

Design and implement a deep learning model that learns to recognize traffic signs. Train and test your model on the German Traffic Sign Dataset.

The LeNet-5 implementation shown in the classroom at the end of the CNN lesson is a solid starting point. You'll have to change the number of classes and possibly the preprocessing, but aside from that it's plug and play!

With the LeNet-5 solution from the lecture, you should expect a validation set accuracy of about 0.89. To meet specifications, the validation set accuracy will need to be at least 0.93. It is possible to get an even higher accuracy, but 0.93 is the minimum for a successful project submission.

There are various aspects to consider when thinking about this problem:

  • Neural network architecture (is the network over or underfitting?)
  • Play around preprocessing techniques (normalization, rgb to grayscale, etc)
  • Number of examples per label (some have more than others).
  • Generate fake data.

Here is an example of a published baseline model on this problem. It's not required to be familiar with the approach used in the paper but, it's good practice to try to read papers like these.

Data Pre-Processing

Minimally, the image data should be normalized so that the data has mean zero and equal variance. For image data, (pixel - 128)/ 128 is a quick way to approximately normalize the data and can be used in this project.

Other pre-processing steps are optional. You can try different techniques to see if it improves performance.

Use the code cell (or multiple code cells, if necessary) to implement the first step of your project.

Normalisation

In [16]:
# Normalise training set
X_train_normalised = normalise_images(X_train, X_train)
In [17]:
# Do the same for training and validation sets, using mean and standard deviation of training set
X_valid_normalised = normalise_images(X_valid, X_train) 
X_test_normalised = normalise_images(X_test, X_train) 
In [18]:
show_random_dataset_images(X_valid_group_by_label, X_valid_normalised)

Grayscale

In [19]:
# Let's test our image graying features on training set
X_train_grayscale = np.asarray(list(map(lambda img: to_grayscale(img), X_train))) 
show_random_dataset_images(X_train_group_by_label, X_train_grayscale)

We can see that grayscaling somewhat improves the visibility of the images, but some of them still remain very dark. At this stage we will not address this problem.

In [20]:
# Do the same on validation set
X_valid_grayscale = np.asarray(list(map(lambda img: to_grayscale(img), X_valid))) 
#show_random_dataset_images(X_valid_group_by_label, X_valid_grayscale)
In [21]:
# Also grayscale test data
X_test_grayscale = np.asarray(list(map(lambda img: to_grayscale(img), X_test))) 
#show_image_list(X_test_grayscale[10:13], y_test[10:13], "Gray image", cols=3, fig_size=(3, 3), show_ticks=False)

Model Architectures

In this section, we will a configurable neural network architecture where the number of layers as well as filter sizes can be modified. But first, we set up the pipeline to enable us to achieve our goal.

Configuration and Execution Pipeline

We define the steps necessary to build a training pipeline in Tensorflow. These are defined as classes to increase flexibility.

In [22]:
class ModelConfig:
    """
    ModelConfig is a utility class that stores important configuration option about our model
    """

    def __init__(self, model, name, input_img_dimensions, conv_layers_config, fc_output_dims, output_classes, dropout_keep_pct):
        self.model = model
        self.name=name
        self.input_img_dimensions = input_img_dimensions
        
        # Determines the wxh dimension of filters, the starting depth (increases by x2 at every layer) 
        # and how many convolutional layers the network has
        self.conv_filter_size = conv_layers_config[0]
        self.conv_depth_start = conv_layers_config[1]
        self.conv_layers_count = conv_layers_config[2]


        self.fc_output_dims = fc_output_dims 
        self.output_classes = output_classes

        # Try with different values for drop out at convolutional and fully connected layers
        self.dropout_conv_keep_pct = dropout_keep_pct[0]
        self.dropout_fc_keep_pct = dropout_keep_pct[1]
In [23]:
class ModelExecutor:
    """
    ModelExecutor is responsible for executing the supplied model
    """
    def __init__(self, model_config, learning_rate=0.001):
        self.model_config = model_config
        self.learning_rate = learning_rate
    
        self.graph = tf.Graph()
        with self.graph.as_default() as g:
            with g.name_scope( self.model_config.name ) as scope:
                # Create Model operations
                self.create_model_operations()

                # Create a saver to persist the results of execution
                self.saver = tf.train.Saver()
    
    def create_placeholders(self):
        """
        Defining our placeholder variables:
            - x, y
            - one_hot_y
            - dropout placeholders
        """

        # e.g. 32 * 32 * 3
        input_dims = self.model_config.input_img_dimensions
        self.x = tf.placeholder(tf.float32, (None, input_dims[0], input_dims[1], input_dims[2]), name="{0}_x".format(self.model_config.name))
        self.y = tf.placeholder(tf.int32, (None), name="{0}_y".format(self.model_config.name))
        self.one_hot_y = tf.one_hot(self.y, self.model_config.output_classes)

        self.dropout_placeholder_conv = tf.placeholder(tf.float32)
        self.dropout_placeholder_fc = tf.placeholder(tf.float32)

    
    def create_model_operations(self):
        """
        Sets up all operations needed to execute run deep learning pipeline
        """

        # First step is to set our x, y, etc
        self.create_placeholders()

        cnn = self.model_config.model

        # Build the network -  TODO: pass the configuration in the future
        self.logits = cnn(self.x, self.model_config, self.dropout_placeholder_conv, self.dropout_placeholder_fc)
        # Obviously, using softmax as the activation function for final layer
        self.cross_entropy = tf.nn.softmax_cross_entropy_with_logits(labels=self.one_hot_y, logits=self.logits)
        # Combined all the losses across batches
        self.loss_operation = tf.reduce_mean(self.cross_entropy)
        # What method do we use to reduce our loss?
        self.optimizer = tf.train.AdamOptimizer(learning_rate=self.learning_rate)
        # What do we really do in a training operation then? Answer: we attempt to reduce the loss using our chosen optimizer
        self.training_operation = self.optimizer.minimize(self.loss_operation)

        # Get the top prediction for model against labels and check whether they match
        self.correct_prediction = tf.equal(tf.argmax(self.logits, 1), tf.argmax(self.one_hot_y, 1))
        # Compute accuracy at batch level
        self.accuracy_operation = tf.reduce_mean(tf.cast(self.correct_prediction, tf.float32))

        # compute what the prediction would be, when we don't have matching label
        self.prediction = tf.argmax(self.logits, 1)
        # Registering our top 5 predictions
        self.top5_predictions = tf.nn.top_k(tf.nn.softmax(self.logits), k=5, sorted=True, name=None)

        
    def evaluate_model(self, X_data, Y_data, batch_size):
        """
        Evaluates the model's accuracy and loss for the supplied dataset. 
        Naturally, Dropout is ignored in this case (i.e. we set dropout_keep_pct to 1.0)
        """
        
        num_examples = len(X_data)
        total_accuracy = 0.0
        total_loss = 0.0
        sess = tf.get_default_session()

        for offset in range(0, num_examples, batch_size):
            batch_x, batch_y = X_data[offset:offset+batch_size], Y_data[offset:offset+batch_size]
            
            # Compute both accuracy and loss for this batch
            accuracy = sess.run(self.accuracy_operation, 
                                feed_dict={
                                    self.dropout_placeholder_conv: 1.0,
                                    self.dropout_placeholder_fc: 1.0,
                                    self.x: batch_x, 
                                    self.y: batch_y
                                           })
            loss = sess.run(self.loss_operation, feed_dict={
                                    self.dropout_placeholder_conv: 1.0,
                                    self.dropout_placeholder_fc: 1.0,
                                    self.x: batch_x, 
                                    self.y: batch_y
                                           })

            # Weighting accuracy by the total number of elements in batch
            total_accuracy += (accuracy * len(batch_x))        
            total_loss += (loss * len(batch_x))            

        # To produce a true mean accuracy over whole dataset
        return (total_accuracy / num_examples, total_loss / num_examples) 
    
    
    def train_model(self, X_train_features, X_train_labels, X_valid_features, y_valid_labels, batch_size=512, epochs=100, PRINT_FREQ=10):
        """
        Trains the model for the specified number of epochs supplied when creating the executor
        """

        # Create our array of metrics    
        training_metrics = np.zeros((epochs, 3))
        validation_metrics = np.zeros((epochs, 3))


        with tf.Session(graph = self.graph, config=tf.ConfigProto(allow_soft_placement=True, log_device_placement=True)) as sess:            
            
            sess.run(tf.global_variables_initializer())
            num_examples = len(X_train_features)


            print("Training {0} [epochs={1}, batch_size={2}]...\n".format(self.model_config.name, epochs, batch_size))

            for i in range(epochs):
                start = time.time()
                X_train, Y_train = shuffle(X_train_features, X_train_labels)
                for offset in range(0, num_examples, batch_size):
                    end = offset + batch_size
                    batch_x, batch_y = X_train[offset:end], Y_train[offset:end]
                    sess.run(self.training_operation, feed_dict={
                        self.x: batch_x, 
                        self.y: batch_y, 
                        self.dropout_placeholder_conv: self.model_config.dropout_conv_keep_pct,
                        self.dropout_placeholder_fc: self.model_config.dropout_fc_keep_pct,

                    })

                end_training_time = time.time()
                training_duration = end_training_time - start

                #computing training accuracy
                training_accuracy, training_loss = self.evaluate_model(X_train_features, X_train_labels, batch_size)            


                # Computing validation accuracy 
                validation_accuracy, validation_loss = self.evaluate_model(X_valid_features, y_valid_labels, batch_size)


                end_epoch_time = time.time()            
                validation_duration = end_epoch_time - end_training_time   
                epoch_duration = end_epoch_time - start

                if i == 0 or (i+1) % PRINT_FREQ == 0:
                    print("[{0}]\ttotal={1:.3f}s | train: time={2:.3f}s, loss={3:.4f}, acc={4:.4f} | val: time={5:.3f}s, loss={6:.4f}, acc={7:.4f}".format(
                          i+1, epoch_duration, training_duration, training_loss, training_accuracy, 
                        validation_duration, validation_loss, validation_accuracy))

                training_metrics[i] = [training_duration, training_loss, training_accuracy]
                validation_metrics[i] = [validation_duration, validation_loss, validation_accuracy]

            model_file_name = "{0}{1}.chkpt".format(models_path, self.model_config.name)
            # Save the model
            self.saver.save(sess, model_file_name)
            print("Model {0} saved".format(model_file_name))

        return (training_metrics, validation_metrics, epoch_duration)
    
    def test_model(self, test_imgs, test_lbs, batch_size=512):
        """
        Evaluates the model with the test dataset and test labels
        Returns the tuple (test_accuracy, test_loss, duration)
        """

        with tf.Session(graph = self.graph) as sess:
            # Never forget to re-initialise the variables
            tf.global_variables_initializer()

            
            model_file_name = "{0}{1}.chkpt".format(models_path, self.model_config.name)
            self.saver.restore(sess, model_file_name)

            start = time.time()
            (test_accuracy, test_loss) = self.evaluate_model(test_imgs, test_lbs, batch_size)
            duration = time.time() - start
            print("[{0} - Test Set]\ttime={1:.3f}s, loss={2:.4f}, acc={3:.4f}".format(self.model_config.name, duration, test_loss, test_accuracy))

        
        return (test_accuracy, test_loss, duration)
    
    def predict(self, imgs, top_5=False):
        """
        Returns the predictions associated with a bunch of images
        """
        preds = None
        with tf.Session(graph = self.graph) as sess:
            # Never forget to re-initialise the variables
            tf.global_variables_initializer()

            model_file_name = "{0}{1}.chkpt".format(models_path, self.model_config.name)
            self.saver.restore(sess, model_file_name)
            
            if top_5:
                preds = sess.run(self.top5_predictions, feed_dict={
                        self.x: imgs,
                        self.dropout_placeholder_conv: 1.0,
                        self.dropout_placeholder_fc: 1.0
                    })
            else:
                preds = sess.run(self.prediction, feed_dict={
                    self.x: imgs,
                    self.dropout_placeholder_conv: 1.0,
                    self.dropout_placeholder_fc: 1.0
                })
    
        return preds
    
    def show_conv_feature_maps(self, img, conv_layer_idx=0, activation_min=-1, activation_max=-1 ,
                               plt_num=1, fig_size=(15, 15), title_y_pos=1.0):
        """
        Shows the resulting feature maps at a given convolutional level for a SINGLE image
        """
        #s = tf.train.Saver()
        with tf.Session(graph = self.graph) as sess:
            # Never forget to re-initialise the variables
            tf.global_variables_initializer()
            #tf.reset_default_graph()

            model_file_name = "{0}{1}.chkpt".format(models_path, self.model_config.name)
            self.saver.restore(sess, model_file_name)
            
            # Run a prediction
            preds = sess.run(self.prediction, feed_dict={
                    self.x: np.reshape(img, [1, img.shape[0], img.shape[1], img.shape[2]]),
                    self.dropout_placeholder_conv: 1.0,
                    self.dropout_placeholder_fc: 1.0
                })

            
            var_name = "{0}/conv_{1}_relu:0".format(self.model_config.name, conv_layer_idx)
            print("Fetching tensor: {0}".format(var_name))
            conv_layer = tf.get_default_graph().get_tensor_by_name(var_name)
            
            activation = sess.run(conv_layer, feed_dict={
                            self.x: np.reshape(img, [1, img.shape[0], img.shape[1], img.shape[2]]),
                            self.dropout_placeholder_conv: 1.0,
                            self.dropout_placeholder_fc: 1.0
                            })
            featuremaps = activation.shape[-1]
            # (1, 13, 13, 64)
            print("Shape of activation layer: {0}".format(activation.shape))
            
            #fix the number of columns
            cols = 8
            rows = featuremaps // cols
            fig, axes = plt.subplots(rows, cols, figsize=fig_size)
            k = 0
            for i in range(0, rows):
                for j in range(0, cols):
                    ax = axes[i, j]
                    featuremap = k

                    if activation_min != -1 & activation_max != -1:
                        ax.imshow(activation[0,:,:, featuremap], interpolation="nearest", vmin =activation_min, vmax=activation_max, cmap="gray")
                    elif activation_max != -1:
                        ax.imshow(activation[0,:,:, featuremap], interpolation="nearest", vmax=activation_max, cmap="gray")
                    elif activation_min !=-1:
                        ax.imshow(activation[0,:,:, featuremap], interpolation="nearest", vmin=activation_min, cmap="gray")
                    else:
                        ax.imshow(activation[0,:,:, featuremap], interpolation="nearest", cmap="gray")
                    
                    ax.axis("off")
                    k += 1
            
            fig.suptitle("Feature Maps at layer: {0}".format(conv_layer), fontsize=12, fontweight='bold', y=title_y_pos)
            fig.tight_layout()
            plt.show()
                            

EdLeNet

We implement simple variant LeNet by Yann Le Cun, comically named EdLeNet. We make the network highly configurable so that we can assess its different variants and determine which one provides the most accuracy.

In [24]:
from tensorflow.contrib.layers import flatten

def EdLeNet(x, mc, dropout_conv_pct, dropout_fc_pct):
    """
    A variant of LeNet created by Yann Le Cun
    The second parameter, which is encapsulates model configuration, enables varying the convolution filter sizes 
    as well as the number of fully connected layers and their output dimensions.
    The third and fourth parameters represent dropout placeholders for convolutional and fully connected layers respectively
    """
    
    # Used for randomly definining weights and biases
    mu = 0
    sigma = 0.1
    
    prev_conv_layer = x
    conv_depth = mc.conv_depth_start
    conv_input_depth = mc.input_img_dimensions[-1]
   
    print("[EdLeNet] Building neural network [conv layers={0}, conv filter size={1}, conv start depth={2}, fc layers={3}]".format(
        mc.conv_layers_count, mc.conv_filter_size, conv_depth, len(mc.fc_output_dims)))
    
    for i in range(0, mc.conv_layers_count):
        # layer depth grows exponentially
        conv_output_depth = conv_depth * (2 ** (i))
        conv_W = tf.Variable(tf.truncated_normal(shape=(mc.conv_filter_size, mc.conv_filter_size, conv_input_depth, conv_output_depth), 
                                              mean = mu, stddev = sigma))
        conv_b = tf.Variable(tf.zeros(conv_output_depth))
        
        conv_output = tf.nn.conv2d(prev_conv_layer, conv_W, strides=[1, 1, 1, 1], padding='VALID', name="conv_{0}".format(i)) + conv_b
        conv_output = tf.nn.relu(conv_output, name="conv_{0}_relu".format(i))
        # Traditional max 2x2 pool 
        conv_output = tf.nn.max_pool(conv_output, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='VALID')
        # Apply dropout - even at the conv level
        conv_output = tf.nn.dropout(conv_output, dropout_conv_pct) 
        
        # Setting our loop variables accordingly
        prev_conv_layer = conv_output
        conv_input_depth = conv_output_depth

    
    # Flatten results of second convolutional layer so that it can be supplied to fully connected layer
    fc0   = flatten(prev_conv_layer)
    
    # Now creating our fully connected layers
    prev_layer = fc0
    for output_dim in mc.fc_output_dims:
        fcn_W  = tf.Variable(tf.truncated_normal(shape=(prev_layer.get_shape().as_list()[-1], output_dim), 
                                                 mean = mu, stddev = sigma))
        fcn_b  = tf.Variable(tf.zeros(output_dim))
        
        prev_layer   = tf.nn.dropout(tf.nn.relu(tf.matmul(prev_layer, fcn_W) + fcn_b), dropout_fc_pct)

    
    # Final layer (Fully Connected)
    fc_final_W  = tf.Variable(tf.truncated_normal(shape=(prev_layer.get_shape().as_list()[-1], mc.output_classes), 
                                                  mean = mu, stddev = sigma))
    fc_final_b  = tf.Variable(tf.zeros(mc.output_classes))
    logits = tf.matmul(prev_layer, fc_final_W) + fc_final_b
    
    return logits

Train, Validate and Test the Model

Non-Normalised Images

Let's begin with color and grayscale non-normalised images, changing the feature size

Color Sample Set

In [25]:
mc_3x3 = ModelConfig(EdLeNet, "EdLeNet_3x3_Color_Sample", [32, 32, 3], [3, 16, 3], [120, 84], n_classes, [1.0, 1.0])
mc_5x5 = ModelConfig(EdLeNet, "EdLeNet_5x5_Color_Sample", [32, 32, 3], [5, 16, 2], [120, 84], n_classes, [1.0, 1.0])
mc_7x7 = ModelConfig(EdLeNet, "EdLeNet_7x7_Color_Sample", [32, 32, 3], [7, 16, 2], [120, 84], n_classes, [1.0, 1.0])
In [26]:
me_c_sample_3x3 = ModelExecutor(mc_3x3)
me_c_sample_5x5 = ModelExecutor(mc_5x5)
me_c_sample_7x7 = ModelExecutor(mc_7x7)
[EdLeNet] Building neural network [conv layers=3, conv filter size=3, conv start depth=16, fc layers=2]
[EdLeNet] Building neural network [conv layers=2, conv filter size=5, conv start depth=16, fc layers=2]
[EdLeNet] Building neural network [conv layers=2, conv filter size=7, conv start depth=16, fc layers=2]
In [27]:
(c_sample_3x3_tr_metrics, c_sample_3x3_val_metrics, c_sample_3x3_duration) = me_c_sample_3x3.train_model(X_sample_train, y_sample_train, X_sample_valid, y_sample_valid, epochs=50)
(c_sample_3x3_ts_accuracy, c_sample_3x3_ts_loss, c_sample_3x3_ts_duration) =  me_c_sample_3x3.test_model(X_test, y_test)
Training EdLeNet_3x3_Color_Sample [epochs=50, batch_size=512]...

[1]	total=1.871s | train: time=1.317s, loss=3.6156, acc=0.0800 | val: time=0.554s, loss=3.8195, acc=0.0722
[10]	total=1.190s | train: time=0.673s, loss=0.5105, acc=0.8609 | val: time=0.517s, loss=1.4129, acc=0.6924
[20]	total=1.191s | train: time=0.674s, loss=0.0910, acc=0.9810 | val: time=0.517s, loss=1.4134, acc=0.7877
[30]	total=1.186s | train: time=0.672s, loss=0.0341, acc=0.9947 | val: time=0.514s, loss=1.8728, acc=0.7842
[40]	total=1.193s | train: time=0.677s, loss=0.0089, acc=0.9997 | val: time=0.516s, loss=1.7286, acc=0.7982
[50]	total=1.191s | train: time=0.674s, loss=0.0035, acc=1.0000 | val: time=0.517s, loss=1.9538, acc=0.7996
Model ./models/EdLeNet_3x3_Color_Sample.chkpt saved
[EdLeNet_3x3_Color_Sample - Test Set]	time=0.546s, loss=1.7775, acc=0.8309
In [28]:
(c_sample_5x5_tr_metrics, c_sample_5x5_val_metrics, c_sample_5x5_duration) = me_c_sample_5x5.train_model(X_sample_train, y_sample_train, X_sample_valid, y_sample_valid, epochs=50)
(c_sample_5x5_ts_accuracy, c_sample_5x5_ts_loss, c_sample_5x5_ts_duration) =  me_c_sample_5x5.test_model(X_test, y_test)
Training EdLeNet_5x5_Color_Sample [epochs=50, batch_size=512]...

[1]	total=1.210s | train: time=0.730s, loss=5.4311, acc=0.1009 | val: time=0.480s, loss=6.1649, acc=0.0862
[10]	total=1.054s | train: time=0.602s, loss=0.2936, acc=0.9206 | val: time=0.452s, loss=1.2362, acc=0.7260
[20]	total=1.055s | train: time=0.606s, loss=0.0265, acc=0.9958 | val: time=0.450s, loss=1.1434, acc=0.8115
[30]	total=1.056s | train: time=0.604s, loss=0.0050, acc=0.9999 | val: time=0.452s, loss=1.2906, acc=0.8248
[40]	total=1.055s | train: time=0.604s, loss=0.0019, acc=1.0000 | val: time=0.450s, loss=1.3826, acc=0.8290
[50]	total=1.053s | train: time=0.604s, loss=0.0010, acc=1.0000 | val: time=0.450s, loss=1.4611, acc=0.8318
Model ./models/EdLeNet_5x5_Color_Sample.chkpt saved
[EdLeNet_5x5_Color_Sample - Test Set]	time=0.468s, loss=1.9586, acc=0.8294
In [29]:
(c_sample_7x7_tr_metrics, c_sample_7x7_val_metrics, c_sample_7x7_duration) = me_c_sample_7x7.train_model(X_sample_train, y_sample_train, X_sample_valid, y_sample_valid, epochs=50)
(c_sample_7x7_ts_accuracy, c_sample_7x7_ts_loss, c_sample_7x7_ts_duration) =  me_c_sample_7x7.test_model(X_test, y_test)
Training EdLeNet_7x7_Color_Sample [epochs=50, batch_size=512]...

[1]	total=1.197s | train: time=0.752s, loss=4.3703, acc=0.1184 | val: time=0.445s, loss=4.5373, acc=0.1072
[10]	total=1.003s | train: time=0.592s, loss=0.5907, acc=0.8412 | val: time=0.412s, loss=1.3450, acc=0.7001
[20]	total=1.004s | train: time=0.594s, loss=0.1028, acc=0.9744 | val: time=0.410s, loss=1.2583, acc=0.7891
[30]	total=1.006s | train: time=0.594s, loss=0.0246, acc=0.9968 | val: time=0.412s, loss=1.3176, acc=0.8178
[40]	total=1.005s | train: time=0.594s, loss=0.0147, acc=0.9986 | val: time=0.411s, loss=1.3916, acc=0.8297
[50]	total=1.003s | train: time=0.593s, loss=0.0038, acc=1.0000 | val: time=0.410s, loss=1.4954, acc=0.8283
Model ./models/EdLeNet_7x7_Color_Sample.chkpt saved
[EdLeNet_7x7_Color_Sample - Test Set]	time=0.434s, loss=2.2401, acc=0.8241

The 3x3 model has much lower loss than the other two on the validation and test sets. We may be tempted to exclusively focus on it, but instead we will run the 3x3 and 5x5 models side by side, as the 7x7 one seems to produce the worst results.

In [30]:
metrics_arr = [c_sample_7x7_tr_metrics, c_sample_7x7_val_metrics, c_sample_5x5_tr_metrics, c_sample_5x5_val_metrics, c_sample_3x3_tr_metrics, c_sample_3x3_val_metrics]
lbs = ["7x7 training", "7x7 validation", "5x5 training", "7x7 validation", "3x3 training", "3x3 validation"]
plot_model_results(metrics_arr, [2, 1], lbs, ["Epochs", "Epochs"], ["Accuracy", "Loss"], 
                   ["Accuracy vs Epochs", 
                    "Loss vs Epochs"],
                   "Color  - Accuracy and Loss of models on sample set"
                    , fig_size=(17, 5))     
Length of axis: (2,)

Color Full Dataset

We now run the 3x3 and 5x5 models on the full color dataset. We also took the liberty to increase the depth of our convolutional layers as we have observed better information starting at 32, rather than 16.

In [31]:
mc_3x3 = ModelConfig(EdLeNet, "EdLeNet_Full_Color_3x3", [32, 32, 3], [3, 32, 3], [120, 84], n_classes, [1.0, 1.0])
mc_5x5 = ModelConfig(EdLeNet, "EdLeNet_Full_Color_5x5", [32, 32, 3], [5, 32, 2], [120, 84], n_classes, [1.0, 1.0])
In [32]:
me_c_3x3 = ModelExecutor(mc_3x3)
me_c_5x5 = ModelExecutor(mc_5x5)
[EdLeNet] Building neural network [conv layers=3, conv filter size=3, conv start depth=32, fc layers=2]
[EdLeNet] Building neural network [conv layers=2, conv filter size=5, conv start depth=32, fc layers=2]
In [33]:
(c_3x3_tr_metrics, c_3x3_val_metrics, c_3x3_duration) = me_c_3x3.train_model(X_train, y_train, X_valid, y_valid, epochs=100)
(c_3x3_ts_accuracy, c_3x3_ts_loss, c_3x3_ts_duration) =  me_c_3x3.test_model(X_test, y_test)
Training EdLeNet_Full_Color_3x3 [epochs=100, batch_size=512]...

[1]	total=6.053s | train: time=3.624s, loss=2.6746, acc=0.3394 | val: time=2.428s, loss=2.9638, acc=0.2952
[10]	total=5.569s | train: time=3.204s, loss=0.0653, acc=0.9837 | val: time=2.365s, loss=0.7756, acc=0.8628
[20]	total=5.574s | train: time=3.205s, loss=0.0072, acc=0.9995 | val: time=2.369s, loss=0.7404, acc=0.8998
[30]	total=5.576s | train: time=3.208s, loss=0.0012, acc=1.0000 | val: time=2.368s, loss=0.8910, acc=0.8982
[40]	total=5.574s | train: time=3.207s, loss=0.0004, acc=1.0000 | val: time=2.367s, loss=0.9336, acc=0.8993
[50]	total=5.577s | train: time=3.209s, loss=0.0002, acc=1.0000 | val: time=2.368s, loss=0.9999, acc=0.8998
[60]	total=5.571s | train: time=3.206s, loss=0.0001, acc=1.0000 | val: time=2.364s, loss=1.0440, acc=0.9002
[70]	total=5.575s | train: time=3.209s, loss=0.0001, acc=1.0000 | val: time=2.366s, loss=1.0927, acc=0.8989
[80]	total=5.575s | train: time=3.207s, loss=0.0000, acc=1.0000 | val: time=2.368s, loss=1.1314, acc=0.9011
[90]	total=5.581s | train: time=3.209s, loss=0.0000, acc=1.0000 | val: time=2.372s, loss=1.1653, acc=0.9002
[100]	total=5.578s | train: time=3.210s, loss=0.0000, acc=1.0000 | val: time=2.368s, loss=1.2076, acc=0.9020
Model ./models/EdLeNet_Full_Color_3x3.chkpt saved
[EdLeNet_Full_Color_3x3 - Test Set]	time=0.797s, loss=1.7482, acc=0.8819
In [34]:
(c_5x5_tr_metrics, c_5x5_val_metrics, c_5x5_duration) = me_c_5x5.train_model(X_train, y_train, X_valid, y_valid, epochs=100)
(c_5x5_ts_accuracy, c_5x5_ts_loss, c_5x5_ts_duration) =  me_c_5x5.test_model(X_test, y_test)
Training EdLeNet_Full_Color_5x5 [epochs=100, batch_size=512]...

[1]	total=5.188s | train: time=3.108s, loss=3.3284, acc=0.1896 | val: time=2.080s, loss=3.5038, acc=0.1814
[10]	total=4.852s | train: time=2.803s, loss=0.1363, acc=0.9624 | val: time=2.050s, loss=1.1154, acc=0.8134
[20]	total=4.855s | train: time=2.807s, loss=0.0367, acc=0.9908 | val: time=2.049s, loss=1.2610, acc=0.8410
[30]	total=4.855s | train: time=2.805s, loss=0.0138, acc=0.9970 | val: time=2.050s, loss=1.3468, acc=0.8764
[40]	total=4.847s | train: time=2.801s, loss=0.0098, acc=0.9976 | val: time=2.045s, loss=1.2302, acc=0.8893
[50]	total=4.843s | train: time=2.800s, loss=0.0086, acc=0.9974 | val: time=2.043s, loss=1.3448, acc=0.8780
[60]	total=4.840s | train: time=2.800s, loss=0.0766, acc=0.9847 | val: time=2.041s, loss=1.3811, acc=0.8873
[70]	total=4.835s | train: time=2.794s, loss=0.0002, acc=1.0000 | val: time=2.042s, loss=1.3315, acc=0.9141
[80]	total=4.844s | train: time=2.794s, loss=0.0001, acc=1.0000 | val: time=2.050s, loss=1.4431, acc=0.9132
[90]	total=4.845s | train: time=2.797s, loss=0.0001, acc=1.0000 | val: time=2.048s, loss=1.5175, acc=0.9132
[100]	total=4.842s | train: time=2.796s, loss=0.0000, acc=1.0000 | val: time=2.045s, loss=1.5828, acc=0.9120
Model ./models/EdLeNet_Full_Color_5x5.chkpt saved
[EdLeNet_Full_Color_5x5 - Test Set]	time=0.703s, loss=1.5240, acc=0.9145
In [35]:
metrics_arr = [c_5x5_tr_metrics, c_5x5_val_metrics, c_3x3_tr_metrics, c_3x3_val_metrics]
lbs = ["5x5 training", "5x5 validation", "3x3 training", "3x3 validation"]
plot_model_results(metrics_arr, [2, 1], lbs, ["Epochs", "Epochs"], ["Accuracy", "Loss"], 
                   ["Accuracy vs Epochs", 
                    "Loss vs Epochs"],
                   "Color  - Accuracy and Loss of models"
                    , fig_size=(17, 5))     
Length of axis: (2,)

Interesting, the 5x5 model is the one that shows the most overfit, and produces significantly higher losses than the 3x3 model, which manages to approach 93% accuracy on both validation and test sets, while not quite getting there. This could be due to the fact that we have a deeper network that uses less coarse convolutions and therefore is able to extract more features. We are next going to test the model on grayscale images.

Grayscale Full Dataset

In [36]:
# Reshaping our inputs to a format that tensorflow accepts (i.e. a 4D tensor)
X_train_grayscale = np.reshape(X_train_grayscale, (X_train_grayscale.shape[0], 32, 32, 1))
X_valid_grayscale = np.reshape(X_valid_grayscale, (X_valid_grayscale.shape[0], 32, 32, 1))
X_test_grayscale = np.reshape(X_test_grayscale, (X_test_grayscale.shape[0], 32, 32, 1))
In [37]:
mc_3x3 = ModelConfig(EdLeNet, "EdLeNet_Grayscale_3x3", [32, 32, 1], [3, 32, 3], [120, 84], n_classes, [1.0, 1.0])
mc_5x5 = ModelConfig(EdLeNet, "EdLeNet_Grayscale_5x5", [32, 32, 1], [5, 32, 2], [120, 84], n_classes, [1.0, 1.0])
In [38]:
me_g_3x3 = ModelExecutor(mc_3x3)
me_g_5x5 = ModelExecutor(mc_5x5)
[EdLeNet] Building neural network [conv layers=3, conv filter size=3, conv start depth=32, fc layers=2]
[EdLeNet] Building neural network [conv layers=2, conv filter size=5, conv start depth=32, fc layers=2]
In [39]:
(g_3x3_tr_metrics, g_3x3_val_metrics, g_3x3_duration) = me_g_3x3.train_model(X_train_grayscale, y_train, X_valid_grayscale, y_valid, epochs=100)
(g_3x3_ts_accuracy, g_3x3_ts_loss, g_3x3_ts_duration) =  me_g_3x3.test_model(X_test_grayscale, y_test)
Training EdLeNet_Grayscale_3x3 [epochs=100, batch_size=512]...

[1]	total=5.204s | train: time=3.146s, loss=2.9789, acc=0.2538 | val: time=2.059s, loss=3.2280, acc=0.2172
[10]	total=5.029s | train: time=3.005s, loss=0.0784, acc=0.9818 | val: time=2.024s, loss=0.8549, acc=0.8569
[20]	total=5.028s | train: time=3.003s, loss=0.0131, acc=0.9977 | val: time=2.026s, loss=0.9449, acc=0.8853
[30]	total=5.031s | train: time=3.002s, loss=0.0023, acc=0.9999 | val: time=2.029s, loss=0.8964, acc=0.9073
[40]	total=5.029s | train: time=3.002s, loss=0.0023, acc=0.9998 | val: time=2.027s, loss=0.8670, acc=0.9143
[50]	total=5.027s | train: time=3.002s, loss=0.0004, acc=1.0000 | val: time=2.025s, loss=1.0178, acc=0.9141
[60]	total=5.029s | train: time=3.002s, loss=0.0002, acc=1.0000 | val: time=2.027s, loss=1.0914, acc=0.9147
[70]	total=5.028s | train: time=3.001s, loss=0.0001, acc=1.0000 | val: time=2.027s, loss=1.1599, acc=0.9154
[80]	total=5.030s | train: time=3.005s, loss=0.0001, acc=1.0000 | val: time=2.025s, loss=1.2026, acc=0.9170
[90]	total=5.028s | train: time=3.001s, loss=0.0000, acc=1.0000 | val: time=2.027s, loss=1.2448, acc=0.9175
[100]	total=5.029s | train: time=3.000s, loss=0.0000, acc=1.0000 | val: time=2.029s, loss=1.2988, acc=0.9179
Model ./models/EdLeNet_Grayscale_3x3.chkpt saved
[EdLeNet_Grayscale_3x3 - Test Set]	time=0.683s, loss=1.3432, acc=0.9165
In [40]:
(g_5x5_tr_metrics, g_5x5_val_metrics, g_5x5_duration) = me_g_5x5.train_model(X_train_grayscale, y_train, X_valid_grayscale, y_valid, epochs=100)
(g_5x5_ts_accuracy, g_5x5_ts_loss, g_5x5_ts_duration) =  me_g_5x5.test_model(X_test_grayscale, y_test)
Training EdLeNet_Grayscale_5x5 [epochs=100, batch_size=512]...

[1]	total=4.206s | train: time=2.523s, loss=3.2163, acc=0.2535 | val: time=1.683s, loss=3.5978, acc=0.2268
[10]	total=4.029s | train: time=2.371s, loss=0.0751, acc=0.9811 | val: time=1.658s, loss=1.1673, acc=0.8460
[20]	total=4.031s | train: time=2.374s, loss=0.0180, acc=0.9956 | val: time=1.657s, loss=0.9817, acc=0.8787
[30]	total=4.029s | train: time=2.372s, loss=0.0015, acc=1.0000 | val: time=1.657s, loss=1.2128, acc=0.8905
[40]	total=4.036s | train: time=2.379s, loss=0.0006, acc=1.0000 | val: time=1.657s, loss=1.3046, acc=0.8937
[50]	total=4.026s | train: time=2.369s, loss=0.0003, acc=1.0000 | val: time=1.657s, loss=1.3804, acc=0.8946
[60]	total=4.038s | train: time=2.376s, loss=0.0001, acc=1.0000 | val: time=1.662s, loss=1.4215, acc=0.8957
[70]	total=4.030s | train: time=2.375s, loss=0.0001, acc=1.0000 | val: time=1.655s, loss=1.4753, acc=0.8937
[80]	total=4.029s | train: time=2.372s, loss=0.0001, acc=1.0000 | val: time=1.657s, loss=1.5495, acc=0.8934
[90]	total=4.031s | train: time=2.372s, loss=0.0000, acc=1.0000 | val: time=1.659s, loss=1.6230, acc=0.8941
[100]	total=4.033s | train: time=2.374s, loss=0.0000, acc=1.0000 | val: time=1.659s, loss=1.6671, acc=0.8923
Model ./models/EdLeNet_Grayscale_5x5.chkpt saved
[EdLeNet_Grayscale_5x5 - Test Set]	time=0.559s, loss=2.2651, acc=0.8862
In [41]:
metrics_arr = [g_5x5_tr_metrics, g_5x5_val_metrics, g_3x3_tr_metrics, g_3x3_val_metrics]
lbs = ["5x5 training", "5x5 validation", "3x3 training", "3x3 validation"]
plot_model_results(metrics_arr, [2, 1], lbs, ["Epochs", "Epochs"], ["Accuracy", "Loss"], 
                   ["Accuracy vs Epochs", 
                    "Loss vs Epochs"],
                   "Grayscale  - Accuracy and Loss of models"
                    , fig_size=(17, 5))     
Length of axis: (2,)

Once again, the 3x3 model outperforms the 5x5 one on validation and test accuracy. We can see that both models' validation losses are increasing as the training accuracy improves. This is a strong suggestion that overfitting is occuring.

Normalised Images

We now assess the model on normnalised images, which should presumably allow faster convergence on both training and validation sets.

Color

In [45]:
mc_3x3 = ModelConfig(EdLeNet, "EdLeNet_Color_Norm_3x3", [32, 32, 3], [3, 32, 3], [120, 84], n_classes, [1.0, 1.0])
mc_5x5 = ModelConfig(EdLeNet, "EdLeNet_Color_Norm_5x5", [32, 32, 3], [5, 32, 2], [120, 84], n_classes, [1.0, 1.0])
In [46]:
me_c_norm_3x3 = ModelExecutor(mc_3x3)
me_c_norm_5x5 = ModelExecutor(mc_5x5)
[EdLeNet] Building neural network [conv layers=3, conv filter size=3, conv start depth=32, fc layers=2]
[EdLeNet] Building neural network [conv layers=2, conv filter size=5, conv start depth=32, fc layers=2]
In [47]:
(c_norm_3x3_tr_metrics, c_norm_3x3_val_metrics, c_norm_3x3_duration) = me_c_norm_3x3.train_model(X_train_normalised, y_train, X_valid_normalised, y_valid, epochs=100)
(c_norm_3x3_ts_accuracy, c_norm_3x3_ts_loss, c_norm_3x3_ts_duration) =  me_c_norm_3x3.test_model(X_test_normalised, y_test)
Training EdLeNet_Color_Norm_3x3 [epochs=100, batch_size=512]...

[1]	total=6.102s | train: time=3.684s, loss=1.2426, acc=0.6300 | val: time=2.418s, loss=1.6380, acc=0.5472
[10]	total=5.930s | train: time=3.530s, loss=0.0275, acc=0.9941 | val: time=2.400s, loss=0.6117, acc=0.8844
[20]	total=5.934s | train: time=3.529s, loss=0.0031, acc=0.9997 | val: time=2.405s, loss=0.5623, acc=0.9188
[30]	total=5.925s | train: time=3.526s, loss=0.0010, acc=1.0000 | val: time=2.400s, loss=0.5493, acc=0.9236
[40]	total=5.929s | train: time=3.532s, loss=0.0002, acc=1.0000 | val: time=2.398s, loss=0.6233, acc=0.9209
[50]	total=5.932s | train: time=3.530s, loss=0.0001, acc=1.0000 | val: time=2.402s, loss=0.6671, acc=0.9177
[60]	total=5.925s | train: time=3.528s, loss=0.0001, acc=1.0000 | val: time=2.397s, loss=0.6992, acc=0.9195
[70]	total=5.912s | train: time=3.518s, loss=0.0000, acc=1.0000 | val: time=2.394s, loss=0.7409, acc=0.9193
[80]	total=5.917s | train: time=3.517s, loss=0.0000, acc=1.0000 | val: time=2.400s, loss=0.7790, acc=0.9195
[90]	total=5.912s | train: time=3.516s, loss=0.0000, acc=1.0000 | val: time=2.396s, loss=0.7986, acc=0.9202
[100]	total=5.925s | train: time=3.527s, loss=0.0000, acc=1.0000 | val: time=2.398s, loss=0.8411, acc=0.9193
Model ./models/EdLeNet_Color_Norm_3x3.chkpt saved
[EdLeNet_Color_Norm_3x3 - Test Set]	time=0.800s, loss=0.8356, acc=0.9310
In [48]:
(c_norm_5x5_tr_metrics, c_norm_5x5_val_metrics, c_norm_5x5_duration) = me_c_norm_5x5.train_model(X_train_normalised, y_train, X_valid_normalised, y_valid, epochs=100)
(c_norm_5x5_ts_accuracy, c_norm_5x5_ts_loss, c_norm_5x5_ts_duration) =  me_c_norm_5x5.test_model(X_test_normalised, y_test)
Training EdLeNet_Color_Norm_5x5 [epochs=100, batch_size=512]...

[1]	total=5.345s | train: time=3.176s, loss=0.7543, acc=0.7937 | val: time=2.169s, loss=1.0707, acc=0.6927
[10]	total=5.328s | train: time=3.165s, loss=0.0156, acc=0.9979 | val: time=2.163s, loss=0.3969, acc=0.9113
[20]	total=5.333s | train: time=3.158s, loss=0.0016, acc=1.0000 | val: time=2.174s, loss=0.3508, acc=0.9324
[30]	total=5.322s | train: time=3.153s, loss=0.0037, acc=0.9992 | val: time=2.169s, loss=0.4272, acc=0.9342
[40]	total=5.325s | train: time=3.152s, loss=0.0001, acc=1.0000 | val: time=2.173s, loss=0.4205, acc=0.9413
[50]	total=5.330s | train: time=3.163s, loss=0.0001, acc=1.0000 | val: time=2.167s, loss=0.4333, acc=0.9422
[60]	total=5.321s | train: time=3.159s, loss=0.0000, acc=1.0000 | val: time=2.162s, loss=0.4521, acc=0.9413
[70]	total=5.331s | train: time=3.162s, loss=0.0000, acc=1.0000 | val: time=2.169s, loss=0.4703, acc=0.9420
[80]	total=5.317s | train: time=3.150s, loss=0.0000, acc=1.0000 | val: time=2.167s, loss=0.4851, acc=0.9408
[90]	total=5.327s | train: time=3.157s, loss=0.0000, acc=1.0000 | val: time=2.170s, loss=0.4982, acc=0.9420
[100]	total=5.318s | train: time=3.153s, loss=0.0000, acc=1.0000 | val: time=2.165s, loss=0.5158, acc=0.9417
Model ./models/EdLeNet_Color_Norm_5x5.chkpt saved
[EdLeNet_Color_Norm_5x5 - Test Set]	time=0.722s, loss=0.7890, acc=0.9299
In [49]:
metrics_arr = [c_norm_5x5_tr_metrics, c_norm_5x5_val_metrics, c_norm_3x3_tr_metrics, c_norm_3x3_val_metrics]
lbs = ["5x5 training", "5x5 validation", "3x3 training", "3x3 validation"]
plot_model_results(metrics_arr, [2, 1], lbs, ["Epochs", "Epochs"], ["Accuracy", "Loss"], 
                   ["Accuracy vs Epochs", 
                    "Loss vs Epochs"],
                   "Color (Normalised)  - Accuracy and Loss of models"
                    , fig_size=(17, 5))     
Length of axis: (2,)

Interesting, this time the roles are reversed: the 3x3 is performing worse on test and validation sets. I don't have a clear explanation, apart from overfitting happening much more quickly with the 3x3 model. We turn next to grayscale normalised images.

Grayscale

In [50]:
# Normalise grayscale images too
X_train_grayscale_normalised = normalise_images(X_train_grayscale, X_train_grayscale)
X_valid_grayscale_normalised = normalise_images(X_valid_grayscale, X_train_grayscale)
X_test_grayscale_normalised = normalise_images(X_test_grayscale, X_train_grayscale)
In [51]:
mc_3x3 = ModelConfig(EdLeNet, "EdLeNet_Grayscale_Norm_3x3", [32, 32, 1], [3, 32, 3], [120, 84], n_classes, [1.0, 1.0])
mc_5x5 = ModelConfig(EdLeNet, "EdLeNet_Grayscale_Norm_5x5", [32, 32, 1], [5, 32, 2], [120, 84], n_classes, [1.0, 1.0])
In [52]:
me_g_norm_3x3 = ModelExecutor(mc_3x3)
me_g_norm_5x5 = ModelExecutor(mc_5x5)
[EdLeNet] Building neural network [conv layers=3, conv filter size=3, conv start depth=32, fc layers=2]
[EdLeNet] Building neural network [conv layers=2, conv filter size=5, conv start depth=32, fc layers=2]
In [53]:
(g_norm_3x3_tr_metrics, g_norm_3x3_val_metrics, g_norm_3x3_duration) = me_g_norm_3x3.train_model(X_train_grayscale_normalised, y_train, X_valid_grayscale_normalised, y_valid, epochs=100)
(g_norm_3x3_ts_accuracy, g_norm_3x3_ts_loss, g_norm_3x3_ts_duration) =  me_g_norm_3x3.test_model(X_test_grayscale_normalised, y_test)
Training EdLeNet_Grayscale_Norm_3x3 [epochs=100, batch_size=512]...

[1]	total=5.308s | train: time=3.250s, loss=1.6293, acc=0.5633 | val: time=2.058s, loss=1.9448, acc=0.4968
[10]	total=5.155s | train: time=3.114s, loss=0.0418, acc=0.9907 | val: time=2.042s, loss=0.4513, acc=0.8968
[20]	total=5.153s | train: time=3.110s, loss=0.0041, acc=0.9998 | val: time=2.044s, loss=0.3829, acc=0.9299
[30]	total=5.151s | train: time=3.111s, loss=0.0012, acc=1.0000 | val: time=2.040s, loss=0.4178, acc=0.9306
[40]	total=5.155s | train: time=3.111s, loss=0.0004, acc=1.0000 | val: time=2.043s, loss=0.4528, acc=0.9345
[50]	total=5.152s | train: time=3.110s, loss=0.0002, acc=1.0000 | val: time=2.042s, loss=0.4947, acc=0.9308
[60]	total=5.154s | train: time=3.110s, loss=0.0001, acc=1.0000 | val: time=2.044s, loss=0.5249, acc=0.9283
[70]	total=5.155s | train: time=3.109s, loss=0.0001, acc=1.0000 | val: time=2.046s, loss=0.5570, acc=0.9288
[80]	total=5.159s | train: time=3.114s, loss=0.0000, acc=1.0000 | val: time=2.046s, loss=0.5908, acc=0.9268
[90]	total=5.150s | train: time=3.109s, loss=0.0023, acc=1.0000 | val: time=2.041s, loss=0.3829, acc=0.9336
[100]	total=5.151s | train: time=3.110s, loss=0.0005, acc=1.0000 | val: time=2.040s, loss=0.4741, acc=0.9311
Model ./models/EdLeNet_Grayscale_Norm_3x3.chkpt saved
[EdLeNet_Grayscale_Norm_3x3 - Test Set]	time=0.676s, loss=0.7154, acc=0.9219
In [54]:
(g_norm_5x5_tr_metrics, g_norm_5x5_val_metrics, g_norm_5x5_duration) = me_g_norm_5x5.train_model(X_train_grayscale_normalised, y_train, X_valid_grayscale_normalised, y_valid, epochs=100)
(g_norm_5x5_ts_accuracy, g_norm_5x5_ts_loss, g_norm_5x5_ts_duration) =  me_g_norm_5x5.test_model(X_test_grayscale_normalised, y_test)
Training EdLeNet_Grayscale_Norm_5x5 [epochs=100, batch_size=512]...

[1]	total=4.248s | train: time=2.527s, loss=0.8447, acc=0.7732 | val: time=1.721s, loss=1.0992, acc=0.6880
[10]	total=4.212s | train: time=2.511s, loss=0.0180, acc=0.9978 | val: time=1.701s, loss=0.3543, acc=0.9186
[20]	total=4.210s | train: time=2.509s, loss=0.0118, acc=0.9966 | val: time=1.701s, loss=0.3736, acc=0.9327
[30]	total=4.205s | train: time=2.508s, loss=0.0005, acc=1.0000 | val: time=1.697s, loss=0.4296, acc=0.9293
[40]	total=4.211s | train: time=2.505s, loss=0.0003, acc=1.0000 | val: time=1.705s, loss=0.4725, acc=0.9295
[50]	total=4.210s | train: time=2.512s, loss=0.0002, acc=1.0000 | val: time=1.698s, loss=0.4852, acc=0.9347
[60]	total=4.206s | train: time=2.504s, loss=0.0002, acc=1.0000 | val: time=1.702s, loss=0.3171, acc=0.9463
[70]	total=4.207s | train: time=2.505s, loss=0.0001, acc=1.0000 | val: time=1.702s, loss=0.3466, acc=0.9449
[80]	total=4.209s | train: time=2.507s, loss=0.0000, acc=1.0000 | val: time=1.702s, loss=0.3619, acc=0.9440
[90]	total=4.207s | train: time=2.505s, loss=0.0000, acc=1.0000 | val: time=1.703s, loss=0.3743, acc=0.9442
[100]	total=4.204s | train: time=2.503s, loss=0.0000, acc=1.0000 | val: time=1.701s, loss=0.3921, acc=0.9431
Model ./models/EdLeNet_Grayscale_Norm_5x5.chkpt saved
[EdLeNet_Grayscale_Norm_5x5 - Test Set]	time=0.564s, loss=0.7372, acc=0.9363
In [55]:
metrics_arr = [g_norm_5x5_tr_metrics, g_norm_5x5_val_metrics, g_norm_3x3_tr_metrics, g_norm_3x3_val_metrics]
lbs = ["5x5 training", "5x5 validation", "3x3 training", "3x3 validation"]
plot_model_results(metrics_arr, [2, 1], lbs, ["Epochs", "Epochs"], ["Accuracy", "Loss"], 
                   ["Accuracy vs Epochs", 
                    "Loss vs Epochs"],
                   "Grayscale (Normalised)  - Accuracy and Loss of models"
                    , fig_size=(17, 5))     
Length of axis: (2,)

The 3x3 nodel shows more erratic behaviour, which is hard to explain at this stage. Overall, it performs slighty better. Sadly none of the 2 model reach 93% test accuracy. It is time to set our eyes on dropout.

Dropout - Keep 75% of weights in Fully Connected Layers

Let's start with a keep percentage of 75% to see what kind of results we get... We will also keep weights in the convolutional layers with a probability of 90%.

Color (Normalised)

In [56]:
mc_3x3 = ModelConfig(EdLeNet, "EdLeNet_3x3_Color_Dropout_0.75", [32, 32, 3], [3, 32, 3], [120, 84], n_classes, [0.9, 0.75])
mc_5x5 = ModelConfig(EdLeNet, "EdLeNet_3x3_Color_Dropout_0.75", [32, 32, 3], [5, 32, 2], [120, 84], n_classes, [0.9, 0.75])
In [57]:
me_c_norm_drpt_0_75_3x3 = ModelExecutor(mc_3x3)
me_c_norm_drpt_0_75_5x5 = ModelExecutor(mc_5x5)
[EdLeNet] Building neural network [conv layers=3, conv filter size=3, conv start depth=32, fc layers=2]
[EdLeNet] Building neural network [conv layers=2, conv filter size=5, conv start depth=32, fc layers=2]
In [58]:
(c_norm_drpt_0_75_3x3_tr_metrics, c_norm_drpt_0_75_3x3_val_metrics, c_norm_drpt_0_75_3x3_duration) = me_c_norm_drpt_0_75_3x3.train_model(X_train_normalised, y_train, X_valid_normalised, y_valid, epochs=100)
(_c_norm_drpt_0_75_3x3_ts_accuracy, c_norm_drpt_0_75_3x3_ts_loss, c_norm_drpt_0_75_ts_duration) =  me_c_norm_drpt_0_75_3x3.test_model(X_test_normalised, y_test)
Training EdLeNet_3x3_Color_Dropout_0.75 [epochs=100, batch_size=512]...

[1]	total=5.979s | train: time=3.548s, loss=2.0883, acc=0.4519 | val: time=2.431s, loss=2.2431, acc=0.3868
[10]	total=5.936s | train: time=3.524s, loss=0.0221, acc=0.9959 | val: time=2.412s, loss=0.1535, acc=0.9658
[20]	total=5.938s | train: time=3.528s, loss=0.0035, acc=0.9996 | val: time=2.410s, loss=0.1180, acc=0.9694
[30]	total=5.928s | train: time=3.522s, loss=0.0009, acc=0.9999 | val: time=2.406s, loss=0.1066, acc=0.9753
[40]	total=5.937s | train: time=3.523s, loss=0.0004, acc=0.9999 | val: time=2.414s, loss=0.1110, acc=0.9764
[50]	total=5.940s | train: time=3.523s, loss=0.0006, acc=0.9999 | val: time=2.417s, loss=0.1013, acc=0.9789
[60]	total=5.933s | train: time=3.523s, loss=0.0003, acc=0.9999 | val: time=2.410s, loss=0.0978, acc=0.9805
[70]	total=5.934s | train: time=3.528s, loss=0.0002, acc=0.9999 | val: time=2.406s, loss=0.1021, acc=0.9810
[80]	total=5.935s | train: time=3.524s, loss=0.0001, acc=1.0000 | val: time=2.410s, loss=0.1237, acc=0.9798
[90]	total=5.930s | train: time=3.520s, loss=0.0002, acc=0.9999 | val: time=2.410s, loss=0.1354, acc=0.9771
[100]	total=5.935s | train: time=3.523s, loss=0.0001, acc=1.0000 | val: time=2.411s, loss=0.1051, acc=0.9823
Model ./models/EdLeNet_3x3_Color_Dropout_0.75.chkpt saved
[EdLeNet_3x3_Color_Dropout_0.75 - Test Set]	time=0.798s, loss=0.2937, acc=0.9648
In [59]:
(c_norm_drpt_0_75_5x5_tr_metrics, c_norm_drpt_0_75_5x5_val_metrics, c_norm_drpt_0_75_5x5_duration) = me_c_norm_drpt_0_75_5x5.train_model(X_train_normalised, y_train, X_valid_normalised, y_valid, epochs=100)
(c_norm_drpt_0_75_5x5_ts_accuracy, c_norm_drpt_0_75_5x5_ts_loss, c_norm_drpt_0_75_5x5_ts_duration) =  me_c_norm_drpt_0_75_5x5.test_model(X_test_normalised, y_test)
Training EdLeNet_3x3_Color_Dropout_0.75 [epochs=100, batch_size=512]...

[1]	total=5.334s | train: time=3.167s, loss=1.8157, acc=0.4894 | val: time=2.167s, loss=2.0400, acc=0.4116
[10]	total=5.326s | train: time=3.161s, loss=0.0257, acc=0.9950 | val: time=2.165s, loss=0.1484, acc=0.9590
[20]	total=5.325s | train: time=3.149s, loss=0.0046, acc=0.9992 | val: time=2.176s, loss=0.1328, acc=0.9678
[30]	total=5.328s | train: time=3.152s, loss=0.0013, acc=0.9998 | val: time=2.175s, loss=0.1018, acc=0.9746
[40]	total=5.315s | train: time=3.146s, loss=0.0007, acc=0.9999 | val: time=2.169s, loss=0.1207, acc=0.9755
[50]	total=5.326s | train: time=3.156s, loss=0.0004, acc=1.0000 | val: time=2.170s, loss=0.1457, acc=0.9726
[60]	total=5.313s | train: time=3.146s, loss=0.0002, acc=1.0000 | val: time=2.167s, loss=0.1487, acc=0.9721
[70]	total=5.312s | train: time=3.148s, loss=0.0003, acc=0.9999 | val: time=2.164s, loss=0.1501, acc=0.9703
[80]	total=5.306s | train: time=3.141s, loss=0.0003, acc=1.0000 | val: time=2.165s, loss=0.1753, acc=0.9705
[90]	total=5.309s | train: time=3.150s, loss=0.0001, acc=1.0000 | val: time=2.159s, loss=0.2665, acc=0.9685
[100]	total=5.301s | train: time=3.140s, loss=0.0002, acc=0.9999 | val: time=2.161s, loss=0.2434, acc=0.9667
Model ./models/EdLeNet_3x3_Color_Dropout_0.75.chkpt saved
[EdLeNet_3x3_Color_Dropout_0.75 - Test Set]	time=0.717s, loss=0.2325, acc=0.9675
In [60]:
metrics_arr = [c_norm_drpt_0_75_5x5_tr_metrics, c_norm_drpt_0_75_5x5_val_metrics, c_norm_drpt_0_75_3x3_tr_metrics, c_norm_drpt_0_75_3x3_val_metrics]
lbs = ["5x5 training", "5x5 validation", "3x3 training", "3x3 validation"]
plot_model_results(metrics_arr, [2, 1], lbs, ["Epochs", "Epochs"], ["Accuracy", "Loss"], 
                   ["Accuracy vs Epochs", 
                    "Loss vs Epochs"],
                   "Color (Normalised) - Accuracy and Loss of models with dropout (p-conv=0.9, p-fc=0.75)"
                    , fig_size=(17, 5))     
Length of axis: (2,)

We are seeing some gret results with the introduction of dropout. The 5x5 model achieves above 96% accuracy while the 3x3 model achieves over 97% test accuracy. We should get similar, if not better results, with grayscale images.

Grayscale (Normalised)

In [61]:
mc_3x3 = ModelConfig(EdLeNet, "EdLeNet_Grayscale_Norm_3x3_Dropout_0.75", [32, 32, 1], [3, 32, 3], [120, 84], n_classes, [0.9, 0.75])
mc_5x5 = ModelConfig(EdLeNet, "EdLeNet_Grayscale_Norm_5x5_Dropout_0.75", [32, 32, 1], [5, 32, 2], [120, 84], n_classes, [0.9, 0.75])
In [62]:
me_g_norm_drpt_0_75_3x3 = ModelExecutor(mc_3x3)
me_g_norm_drpt_0_75_5x5 = ModelExecutor(mc_5x5)
[EdLeNet] Building neural network [conv layers=3, conv filter size=3, conv start depth=32, fc layers=2]
[EdLeNet] Building neural network [conv layers=2, conv filter size=5, conv start depth=32, fc layers=2]
In [63]:
(g_norm_drpt_0_75_3x3_tr_metrics, g_norm_drpt_0_75_3x3_val_metrics, g_norm_drpt_0_75_3x3_duration) = me_g_norm_drpt_0_75_3x3.train_model(X_train_grayscale_normalised, y_train, X_valid_grayscale_normalised, y_valid, epochs=100)
(g_norm_drpt_0_75_3x3_ts_accuracy, g_norm_drpt_0_75_3x3_ts_loss, g_norm_drpt_0_75_ts_3x3_duration) =  me_g_norm_drpt_0_75_3x3.test_model(X_test_grayscale_normalised, y_test)
Training EdLeNet_Grayscale_Norm_3x3_Dropout_0.75 [epochs=100, batch_size=512]...

[1]	total=5.234s | train: time=3.147s, loss=2.6605, acc=0.3427 | val: time=2.086s, loss=2.8468, acc=0.3009
[10]	total=5.201s | train: time=3.125s, loss=0.0331, acc=0.9934 | val: time=2.076s, loss=0.1449, acc=0.9596
[20]	total=5.192s | train: time=3.126s, loss=0.0062, acc=0.9991 | val: time=2.067s, loss=0.0996, acc=0.9721
[30]	total=5.194s | train: time=3.124s, loss=0.0015, acc=0.9998 | val: time=2.070s, loss=0.0950, acc=0.9773
[40]	total=5.191s | train: time=3.123s, loss=0.0005, acc=1.0000 | val: time=2.068s, loss=0.0811, acc=0.9807
[50]	total=5.196s | train: time=3.124s, loss=0.0002, acc=1.0000 | val: time=2.071s, loss=0.0849, acc=0.9821
[60]	total=5.187s | train: time=3.122s, loss=0.0002, acc=1.0000 | val: time=2.065s, loss=0.0828, acc=0.9823
[70]	total=5.199s | train: time=3.126s, loss=0.0003, acc=0.9999 | val: time=2.073s, loss=0.0864, acc=0.9814
[80]	total=5.194s | train: time=3.123s, loss=0.0001, acc=1.0000 | val: time=2.072s, loss=0.0929, acc=0.9814
[90]	total=5.193s | train: time=3.122s, loss=0.0001, acc=1.0000 | val: time=2.071s, loss=0.0790, acc=0.9848
[100]	total=5.189s | train: time=3.120s, loss=0.0005, acc=1.0000 | val: time=2.069s, loss=0.0881, acc=0.9834
Model ./models/EdLeNet_Grayscale_Norm_3x3_Dropout_0.75.chkpt saved
[EdLeNet_Grayscale_Norm_3x3_Dropout_0.75 - Test Set]	time=0.688s, loss=0.2150, acc=0.9637
In [64]:
(g_norm_drpt_0_75_5x5_tr_metrics, g_norm_drpt_0_75_5x5_val_metrics, g_norm_drpt_0_75_5x5_duration) = me_g_norm_drpt_0_75_5x5.train_model(X_train_grayscale_normalised, y_train, X_valid_grayscale_normalised, y_valid, epochs=100)
(g_norm_drpt_0_75_5x5_ts_accuracy, g_norm_drpt_0_75_5x5_ts_loss, g_norm_drpt_0_75_ts_5x5_duration) =  me_g_norm_drpt_0_75_5x5.test_model(X_test_grayscale_normalised, y_test)
Training EdLeNet_Grayscale_Norm_5x5_Dropout_0.75 [epochs=100, batch_size=512]...

[1]	total=4.280s | train: time=2.533s, loss=2.0528, acc=0.4927 | val: time=1.747s, loss=2.2621, acc=0.4424
[10]	total=4.243s | train: time=2.512s, loss=0.0370, acc=0.9938 | val: time=1.731s, loss=0.2205, acc=0.9472
[20]	total=4.248s | train: time=2.516s, loss=0.0080, acc=0.9990 | val: time=1.732s, loss=0.1210, acc=0.9685
[30]	total=4.242s | train: time=2.515s, loss=0.0020, acc=0.9997 | val: time=1.727s, loss=0.1261, acc=0.9651
[40]	total=4.256s | train: time=2.522s, loss=0.0009, acc=1.0000 | val: time=1.734s, loss=0.1018, acc=0.9739
[50]	total=4.251s | train: time=2.518s, loss=0.0003, acc=1.0000 | val: time=1.733s, loss=0.0927, acc=0.9810
[60]	total=4.245s | train: time=2.518s, loss=0.0002, acc=1.0000 | val: time=1.727s, loss=0.1032, acc=0.9766
[70]	total=4.237s | train: time=2.513s, loss=0.0001, acc=1.0000 | val: time=1.725s, loss=0.0962, acc=0.9807
[80]	total=4.238s | train: time=2.513s, loss=0.0001, acc=0.9999 | val: time=1.725s, loss=0.0991, acc=0.9832
[90]	total=4.245s | train: time=2.513s, loss=0.0001, acc=1.0000 | val: time=1.732s, loss=0.1178, acc=0.9789
[100]	total=4.239s | train: time=2.509s, loss=0.0006, acc=0.9998 | val: time=1.730s, loss=0.1270, acc=0.9807
Model ./models/EdLeNet_Grayscale_Norm_5x5_Dropout_0.75.chkpt saved
[EdLeNet_Grayscale_Norm_5x5_Dropout_0.75 - Test Set]	time=0.576s, loss=0.2843, acc=0.9617
In [65]:
metrics_arr = [g_norm_drpt_0_75_5x5_tr_metrics, g_norm_drpt_0_75_5x5_val_metrics, g_norm_drpt_0_75_3x3_tr_metrics, g_norm_drpt_0_75_3x3_val_metrics]
lbs = ["5x5 training", "5x5 validation", "3x3 training", "3x3 validation "]
plot_model_results(metrics_arr, [2, 1], lbs, ["Epochs", "Epochs"], ["Accuracy", "Loss"], 
                   ["Accuracy vs Epochs", 
                    "Loss vs Epochs"],
                   "Grayscale (Normalised) - Accuracy and Loss of models with dropout (p-conv=0.9, p-fc=0.75)"
                    , fig_size=(17, 5))     
Length of axis: (2,)

Once again, both 3x3 and 5x5 variants are very close, accuracy-wise on all dataset types. Interestingly, the 3x3 model achieves lower loss but also lower accuracy than its counterpart 3x3 model on color images. We should do more runs of those models to determine which one performs better over the long run in the future. But We believe that we don't need to go that far: we can use even more aggressive dropout values to obtain better results.

Dropout - Keep Fully Connected Weights with a 50% probability

Grayscale

In [66]:
mc_3x3 = ModelConfig(EdLeNet, "EdLeNet_Norm_Grayscale_3x3_Dropout_0.50", [32, 32, 1], [3, 32, 3], [120, 84], n_classes, [0.75, 0.5])
mc_5x5 = ModelConfig(EdLeNet, "EdLeNet_Norm_Grayscale_5x5_Dropout_0.50", [32, 32, 1], [5, 32, 2], [120, 84], n_classes, [0.75, 0.5])
In [67]:
me_g_norm_drpt_0_50_3x3 = ModelExecutor(mc_3x3)
me_g_norm_drpt_0_50_5x5 = ModelExecutor(mc_5x5)
[EdLeNet] Building neural network [conv layers=3, conv filter size=3, conv start depth=32, fc layers=2]
[EdLeNet] Building neural network [conv layers=2, conv filter size=5, conv start depth=32, fc layers=2]
In [68]:
(g_norm_drpt_0_50_3x3_tr_metrics, g_norm_drpt_0_50_3x3_val_metrics, g_norm_drpt_0_50_3x3_duration) = me_g_norm_drpt_0_50_3x3.train_model(X_train_grayscale_normalised, y_train, X_valid_grayscale_normalised, y_valid, epochs=100)
(g_norm_drpt_0_50_3x3_ts_accuracy, g_norm_drpt_0_50_3x3_ts_loss, g_norm_drpt_0_50_ts_3x3_duration) =  me_g_norm_drpt_0_50_3x3.test_model(X_test_grayscale_normalised, y_test)
Training EdLeNet_Norm_Grayscale_3x3_Dropout_0.50 [epochs=100, batch_size=512]...

[1]	total=5.222s | train: time=3.139s, loss=3.4993, acc=0.1047 | val: time=2.083s, loss=3.5613, acc=0.1007
[10]	total=5.190s | train: time=3.122s, loss=0.2589, acc=0.9360 | val: time=2.067s, loss=0.3260, acc=0.8973
[20]	total=5.191s | train: time=3.122s, loss=0.0448, acc=0.9907 | val: time=2.069s, loss=0.0990, acc=0.9769
[30]	total=5.191s | train: time=3.122s, loss=0.0176, acc=0.9963 | val: time=2.070s, loss=0.0843, acc=0.9732
[40]	total=5.201s | train: time=3.127s, loss=0.0081, acc=0.9980 | val: time=2.074s, loss=0.0570, acc=0.9855
[50]	total=5.199s | train: time=3.125s, loss=0.0036, acc=0.9995 | val: time=2.073s, loss=0.0648, acc=0.9825
[60]	total=5.200s | train: time=3.125s, loss=0.0023, acc=0.9997 | val: time=2.075s, loss=0.0582, acc=0.9850
[70]	total=5.195s | train: time=3.124s, loss=0.0015, acc=0.9998 | val: time=2.070s, loss=0.0568, acc=0.9864
[80]	total=5.194s | train: time=3.123s, loss=0.0009, acc=0.9999 | val: time=2.071s, loss=0.0556, acc=0.9884
[90]	total=5.193s | train: time=3.120s, loss=0.0006, acc=0.9999 | val: time=2.074s, loss=0.0747, acc=0.9841
[100]	total=5.191s | train: time=3.123s, loss=0.0004, acc=1.0000 | val: time=2.068s, loss=0.0849, acc=0.9832
Model ./models/EdLeNet_Norm_Grayscale_3x3_Dropout_0.50.chkpt saved
[EdLeNet_Norm_Grayscale_3x3_Dropout_0.50 - Test Set]	time=0.686s, loss=0.1119, acc=0.9755
In [69]:
(g_norm_drpt_0_50_5x5_tr_metrics, g_norm_drpt_0_50_5x5_val_metrics, g_norm_drpt_0_50_5x5_duration) = me_g_norm_drpt_0_50_5x5.train_model(X_train_grayscale_normalised, y_train, X_valid_grayscale_normalised, y_valid, epochs=100)
(g_norm_drpt_0_50_5x5_ts_accuracy, g_norm_drpt_0_50_5x5_ts_loss, g_norm_drpt_0_50_ts_5x5_duration) =  me_g_norm_drpt_0_50_5x5.test_model(X_test_grayscale_normalised, y_test)
Training EdLeNet_Norm_Grayscale_5x5_Dropout_0.50 [epochs=100, batch_size=512]...

[1]	total=4.264s | train: time=2.525s, loss=3.4130, acc=0.1862 | val: time=1.739s, loss=3.4703, acc=0.1474
[10]	total=4.199s | train: time=2.502s, loss=0.2369, acc=0.9447 | val: time=1.697s, loss=0.3337, acc=0.9088
[20]	total=4.196s | train: time=2.500s, loss=0.0568, acc=0.9897 | val: time=1.696s, loss=0.1326, acc=0.9635
[30]	total=4.196s | train: time=2.499s, loss=0.0225, acc=0.9970 | val: time=1.697s, loss=0.1078, acc=0.9698
[40]	total=4.198s | train: time=2.500s, loss=0.0095, acc=0.9987 | val: time=1.699s, loss=0.0845, acc=0.9694
[50]	total=4.192s | train: time=2.495s, loss=0.0048, acc=0.9994 | val: time=1.696s, loss=0.0889, acc=0.9705
[60]	total=4.187s | train: time=2.490s, loss=0.0034, acc=0.9995 | val: time=1.697s, loss=0.0818, acc=0.9751
[70]	total=4.188s | train: time=2.497s, loss=0.0020, acc=0.9999 | val: time=1.691s, loss=0.0666, acc=0.9785
[80]	total=4.189s | train: time=2.491s, loss=0.0028, acc=0.9994 | val: time=1.698s, loss=0.0729, acc=0.9794
[90]	total=4.187s | train: time=2.494s, loss=0.0008, acc=0.9999 | val: time=1.693s, loss=0.0834, acc=0.9753
[100]	total=4.183s | train: time=2.489s, loss=0.0008, acc=0.9999 | val: time=1.694s, loss=0.0612, acc=0.9814
Model ./models/EdLeNet_Norm_Grayscale_5x5_Dropout_0.50.chkpt saved
[EdLeNet_Norm_Grayscale_5x5_Dropout_0.50 - Test Set]	time=0.573s, loss=0.1360, acc=0.9717
In [70]:
metrics_arr = [g_norm_drpt_0_50_5x5_tr_metrics, g_norm_drpt_0_50_5x5_val_metrics, g_norm_drpt_0_50_3x3_tr_metrics, g_norm_drpt_0_50_3x3_val_metrics]
lbs = ["5x5 training", "5x5 validation", "3x3 training", "3x3 validation "]
plot_model_results(metrics_arr, [2, 1], lbs, ["Epochs", "Epochs"], ["Accuracy", "Loss"], 
                   ["Accuracy vs Epochs", 
                    "Loss vs Epochs"],
                   "Grayscale (normalised)  - Accuracy and Loss of models with dropout (p-conv=0.75, p-fc=0.5)"
                    , fig_size=(17, 5))
Length of axis: (2,)

Both models show smooth, satisfying on the curves. The 3x3 model clearly seems to perform the best. While we are able to score above 98% accuracy on vaidation set, we have not been able to break through this barrier yet on the test set, but we are close!

Color

In [71]:
mc_3x3 = ModelConfig(EdLeNet, "EdLeNet_Color_Norm_3x3_Dropout_0.40", [32, 32, 3], [3, 32, 3], [120, 84], n_classes, [0.75, 0.5])
mc_5x5 = ModelConfig(EdLeNet, "EdLeNet_Color_Norm_5x5_Dropout_0.40", [32, 32, 3], [5, 32, 2], [120, 84], n_classes, [0.75, 0.5])
In [72]:
me_c_norm_drpt_0_50_3x3 = ModelExecutor(mc_3x3, learning_rate=0.001)
me_c_norm_drpt_0_50_5x5 = ModelExecutor(mc_5x5, learning_rate=0.001)
[EdLeNet] Building neural network [conv layers=3, conv filter size=3, conv start depth=32, fc layers=2]
[EdLeNet] Building neural network [conv layers=2, conv filter size=5, conv start depth=32, fc layers=2]
In [73]:
(c_norm_drpt_0_50_3x3_tr_metrics, c_norm_drpt_0_50_3x3_val_metrics, c_norm_drpt_0_50_3x3_duration) = me_c_norm_drpt_0_50_3x3.train_model(X_train_normalised, y_train, X_valid_normalised, y_valid, epochs=100)
(c_norm_drpt_0_50_3x3_ts_accuracy, c_norm_drpt_0_50_3x3_ts_loss, c_norm_drpt_0_50_3x3_ts_duration) =  me_c_norm_drpt_0_50_3x3.test_model(X_test_normalised, y_test)
Training EdLeNet_Color_Norm_3x3_Dropout_0.40 [epochs=100, batch_size=512]...

[1]	total=5.979s | train: time=3.549s, loss=3.5526, acc=0.1000 | val: time=2.430s, loss=3.5821, acc=0.0900
[10]	total=5.947s | train: time=3.527s, loss=0.5104, acc=0.8560 | val: time=2.420s, loss=0.6319, acc=0.8245
[20]	total=5.945s | train: time=3.538s, loss=0.0717, acc=0.9856 | val: time=2.408s, loss=0.1433, acc=0.9594
[30]	total=5.942s | train: time=3.526s, loss=0.0198, acc=0.9962 | val: time=2.416s, loss=0.0820, acc=0.9744
[40]	total=5.957s | train: time=3.538s, loss=0.0087, acc=0.9983 | val: time=2.419s, loss=0.0691, acc=0.9810
[50]	total=5.940s | train: time=3.523s, loss=0.0045, acc=0.9989 | val: time=2.416s, loss=0.0587, acc=0.9855
[60]	total=5.943s | train: time=3.526s, loss=0.0030, acc=0.9992 | val: time=2.417s, loss=0.0572, acc=0.9837
[70]	total=5.950s | train: time=3.536s, loss=0.0015, acc=0.9997 | val: time=2.414s, loss=0.0508, acc=0.9875
[80]	total=5.944s | train: time=3.524s, loss=0.0009, acc=0.9998 | val: time=2.419s, loss=0.0493, acc=0.9880
[90]	total=5.939s | train: time=3.524s, loss=0.0006, acc=0.9999 | val: time=2.414s, loss=0.0618, acc=0.9844
[100]	total=5.956s | train: time=3.539s, loss=0.0006, acc=0.9999 | val: time=2.417s, loss=0.0615, acc=0.9859
Model ./models/EdLeNet_Color_Norm_3x3_Dropout_0.40.chkpt saved
[EdLeNet_Color_Norm_3x3_Dropout_0.40 - Test Set]	time=0.792s, loss=0.1329, acc=0.9754
In [74]:
(c_norm_drpt_0_50_5x5_tr_metrics, c_norm_drpt_0_50_5x5_val_metrics, c_norm_drpt_0_50_5x5_duration) = me_c_norm_drpt_0_50_5x5.train_model(X_train_normalised, y_train, X_valid_normalised, y_valid, epochs=100)
(c_norm_drpt_0_50_5x5_ts_accuracy, c_norm_drpt_0_50_5x5_ts_loss, c_norm_drpt_0_50_5x5_ts_duration) =  me_c_norm_drpt_0_50_5x5.test_model(X_test_normalised, y_test)
Training EdLeNet_Color_Norm_5x5_Dropout_0.40 [epochs=100, batch_size=512]...

[1]	total=5.354s | train: time=3.172s, loss=3.4070, acc=0.1899 | val: time=2.182s, loss=3.4436, acc=0.1703
[10]	total=5.315s | train: time=3.148s, loss=0.3248, acc=0.9240 | val: time=2.167s, loss=0.4420, acc=0.8912
[20]	total=5.321s | train: time=3.153s, loss=0.0726, acc=0.9886 | val: time=2.168s, loss=0.2155, acc=0.9515
[30]	total=5.322s | train: time=3.160s, loss=0.0231, acc=0.9967 | val: time=2.163s, loss=0.1538, acc=0.9667
[40]	total=5.314s | train: time=3.143s, loss=0.0085, acc=0.9986 | val: time=2.171s, loss=0.1140, acc=0.9689
[50]	total=5.313s | train: time=3.145s, loss=0.0045, acc=0.9993 | val: time=2.168s, loss=0.1125, acc=0.9707
[60]	total=5.313s | train: time=3.144s, loss=0.0032, acc=0.9997 | val: time=2.169s, loss=0.1530, acc=0.9737
[70]	total=5.301s | train: time=3.143s, loss=0.0025, acc=0.9994 | val: time=2.158s, loss=0.1707, acc=0.9735
[80]	total=5.306s | train: time=3.143s, loss=0.0010, acc=0.9999 | val: time=2.162s, loss=0.1390, acc=0.9757
[90]	total=5.295s | train: time=3.134s, loss=0.0048, acc=0.9991 | val: time=2.161s, loss=0.1393, acc=0.9739
[100]	total=5.300s | train: time=3.137s, loss=0.0015, acc=0.9999 | val: time=2.162s, loss=0.1563, acc=0.9789
Model ./models/EdLeNet_Color_Norm_5x5_Dropout_0.40.chkpt saved
[EdLeNet_Color_Norm_5x5_Dropout_0.40 - Test Set]	time=0.714s, loss=0.1685, acc=0.9703
In [75]:
metrics_arr = [c_norm_drpt_0_50_5x5_tr_metrics, c_norm_drpt_0_50_5x5_val_metrics, c_norm_drpt_0_50_3x3_tr_metrics, c_norm_drpt_0_50_3x3_val_metrics]
lbs = ["5x5 training", "5x5 validation", "3x3 training", "3x3 validation "]
plot_model_results(metrics_arr, [2, 1], lbs, ["Epochs", "Epochs"], ["Accuracy", "Loss"], 
                   ["Accuracy vs Epochs", 
                    "Loss vs Epochs"],
                   "Color (normalised)  - Accuracy and Loss of models with dropout (p-conv=0.75, p-fc=0.5)"
                    , fig_size=(17, 5))
Length of axis: (2,)

Color and grayscale images exhibit very similar behaviour, with maybe the advantage for grayscale, lower losses on test accuracy. At this stage we are unable to pick between grayscale or color for the input images since we obtain very similar results in both cases. Maybe color is not much of an issue in fact if the model performs so well on grayscale images? We will next turn to augmenting images and explore whether we can reach better results by randomly applying distortions to our images.

Histogram Equalization

We observed earlier that some of the images in the dataset were quite dark, and others blurry. We are aiming at taclking this problem in this part where we attempt to apply histogram equalization techniques to improve the contrast of our images and therefore make it easier for our model to discern which class the supplied image ultimately belongs to with a higher level of confidence. To do so, we are going to rely on OpenCV's CLAHE (Contrast Limiting Adaptive Histogram Equalization) function.

In [463]:
g_img = np.reshape(X_train_grayscale[15796], (32, 32))
plt.imshow(g_img, cmap='gray')
Out[463]:
<matplotlib.image.AxesImage at 0x7fd5700f1dd8>
In [464]:
clahe = cv2.createCLAHE(tileGridSize=(4,4), clipLimit=40.0)
cl1 = clahe.apply(g_img)
plt.imshow(cl1, cmap='gray')
Out[464]:
<matplotlib.image.AxesImage at 0x7fd5d0bcd400>
In [474]:
clahe = cv2.createCLAHE(tileGridSize=(2,2), clipLimit=15.0)
cl2 = clahe.apply(g_img)
plt.imshow(cl2, cmap='gray')
show_image_list([g_img, cl1, cl2], ["L0","L1", "l2"], "", cols=3, fig_size=(3, 3), show_ticks=False)

We choose CLAHE with grid size of 2x2 and climit limit of 15.

In [477]:
X_train_grayscale_equalized = np.asarray(list(map(lambda img: clahe.apply(np.reshape(img, (32, 32))), X_train_grayscale)))
X_valid_grayscale_equalized = np.asarray(list(map(lambda img: clahe.apply(np.reshape(img, (32, 32))), X_valid_grayscale)))
X_test_grayscale_equalized = np.asarray(list(map(lambda img: clahe.apply(np.reshape(img, (32, 32))), X_test_grayscale)))
In [479]:
show_image_list(X_train_grayscale_equalized[1640:1650], y_train[1640:1650], "Equalized images", fig_size=(15, 10), cols=10, show_ticks=False)

Model evaluation

In [480]:
# Let's normalise our images beforehand
X_train_grayscale_equalized_normalised = normalise_images(X_train_grayscale_equalized, X_train_grayscale_equalized)
X_valid_grayscale_equalized_normalised = normalise_images(X_valid_grayscale_equalized, X_train_grayscale_equalized)
X_test_grayscale_equalized_normalised = normalise_images(X_test_grayscale_equalized, X_train_grayscale_equalized)
In [481]:
# Let's not forget to reshape our images
X_train_grayscale_equalized = np.reshape(X_train_grayscale_equalized, (X_train_grayscale_equalized.shape[0], 32, 32, 1))
X_valid_grayscale_equalized = np.reshape(X_valid_grayscale_equalized, (X_valid_grayscale_equalized.shape[0], 32, 32, 1))
X_test_grayscale_equalized = np.reshape(X_test_grayscale_equalized, (X_test_grayscale_equalized.shape[0], 32, 32, 1))

X_train_grayscale_equalized_normalised = np.reshape(X_train_grayscale_equalized_normalised, (X_train_grayscale_equalized_normalised.shape[0], 32, 32, 1))
X_valid_grayscale_equalized_normalised = np.reshape(X_valid_grayscale_equalized_normalised, (X_valid_grayscale_equalized_normalised.shape[0], 32, 32, 1))
X_test_grayscale_equalized_normalised = np.reshape(X_test_grayscale_equalized_normalised, (X_test_grayscale_equalized_normalised.shape[0], 32, 32, 1))

Normalised images With No Dropout

In [488]:
mc_3x3 = ModelConfig(EdLeNet, "EdLeNet_Grayscale_CLAHE_Norm_3x3", [32, 32, 1], [3, 32, 3], [120, 84], n_classes, [1.0, 1.0])
mc_5x5 = ModelConfig(EdLeNet, "EdLeNet_Grayscale_CLAHE_Norm_5x5", [32, 32, 1], [5, 32, 2], [120, 84], n_classes, [1.0, 1.0])
In [489]:
me_g_clahe_norm_3x3 = ModelExecutor(mc_3x3)
me_g_clahe_norm_5x5 = ModelExecutor(mc_5x5)
[EdLeNet] Building neural network [conv layers=3, conv filter size=3, conv start depth=32, fc layers=2]
[EdLeNet] Building neural network [conv layers=2, conv filter size=5, conv start depth=32, fc layers=2]
In [490]:
(g_clahe_norm_3x3_tr_metrics, g_clahe_norm_3x3_val_metrics, g_clahe_norm_3x3_duration) = me_g_clahe_norm_3x3.train_model(X_train_grayscale_equalized_normalised, y_train, X_valid_grayscale_equalized_normalised, y_valid, epochs=100)
(g_clahe_norm_3x3_ts_accuracy, g_clahe_norm_3x3_ts_loss, g_clahe_norm_3x3_ts_duration) =  me_g_clahe_norm_3x3.test_model(X_test_grayscale_equalized_normalised, y_test)
Training EdLeNet_Grayscale_CLAHE_Norm_3x3 [epochs=100, batch_size=512]...

[1]	total=5.339s | train: time=3.284s, loss=1.0458, acc=0.6937 | val: time=2.055s, loss=1.2851, acc=0.6422
[10]	total=5.149s | train: time=3.111s, loss=0.0199, acc=0.9967 | val: time=2.037s, loss=0.2177, acc=0.9363
[20]	total=5.145s | train: time=3.108s, loss=0.0011, acc=1.0000 | val: time=2.037s, loss=0.2412, acc=0.9442
[30]	total=5.149s | train: time=3.109s, loss=0.0015, acc=1.0000 | val: time=2.040s, loss=0.2197, acc=0.9463
[40]	total=5.149s | train: time=3.109s, loss=0.0003, acc=1.0000 | val: time=2.041s, loss=0.2435, acc=0.9497
[50]	total=5.146s | train: time=3.106s, loss=0.0001, acc=1.0000 | val: time=2.040s, loss=0.2561, acc=0.9494
[60]	total=5.149s | train: time=3.111s, loss=0.0001, acc=1.0000 | val: time=2.039s, loss=0.2645, acc=0.9508
[70]	total=5.145s | train: time=3.110s, loss=0.0000, acc=1.0000 | val: time=2.035s, loss=0.2788, acc=0.9512
[80]	total=5.148s | train: time=3.110s, loss=0.0000, acc=1.0000 | val: time=2.038s, loss=0.2866, acc=0.9512
[90]	total=5.146s | train: time=3.108s, loss=0.0000, acc=1.0000 | val: time=2.038s, loss=0.2935, acc=0.9503
[100]	total=5.145s | train: time=3.106s, loss=0.0000, acc=1.0000 | val: time=2.039s, loss=0.3051, acc=0.9512
Model ./models/EdLeNet_Grayscale_CLAHE_Norm_3x3.chkpt saved
[EdLeNet_Grayscale_CLAHE_Norm_3x3 - Test Set]	time=0.676s, loss=0.5438, acc=0.9339
In [491]:
(g_clahe_norm_5x5_tr_metrics, g_clahe_norm_5x5_val_metrics, g_clahe_norm_5x5_duration) = me_g_clahe_norm_5x5.train_model(X_train_grayscale_equalized_normalised, y_train, X_valid_grayscale_equalized_normalised, y_valid, epochs=100)
(g_clahe_norm_5x5_ts_accuracy, g_clahe_norm_5x5_ts_loss, g_clahe_norm_5x5_ts_duration) =  me_g_clahe_norm_5x5.test_model(X_test_grayscale_equalized_normalised, y_test)
Training EdLeNet_Grayscale_CLAHE_Norm_5x5 [epochs=100, batch_size=512]...

[1]	total=4.224s | train: time=2.520s, loss=0.6980, acc=0.8037 | val: time=1.705s, loss=0.9092, acc=0.7540
[10]	total=4.197s | train: time=2.505s, loss=0.0132, acc=0.9981 | val: time=1.693s, loss=0.2573, acc=0.9356
[20]	total=4.194s | train: time=2.501s, loss=0.0008, acc=1.0000 | val: time=1.693s, loss=0.2849, acc=0.9415
[30]	total=4.197s | train: time=2.501s, loss=0.0003, acc=1.0000 | val: time=1.696s, loss=0.3122, acc=0.9424
[40]	total=4.199s | train: time=2.504s, loss=0.0001, acc=1.0000 | val: time=1.695s, loss=0.3308, acc=0.9417
[50]	total=4.196s | train: time=2.503s, loss=0.0001, acc=1.0000 | val: time=1.693s, loss=0.3494, acc=0.9420
[60]	total=4.198s | train: time=2.503s, loss=0.0000, acc=1.0000 | val: time=1.695s, loss=0.3608, acc=0.9413
[70]	total=4.200s | train: time=2.506s, loss=0.0000, acc=1.0000 | val: time=1.694s, loss=0.3776, acc=0.9426
[80]	total=4.201s | train: time=2.508s, loss=0.0000, acc=1.0000 | val: time=1.693s, loss=0.3941, acc=0.9429
[90]	total=4.196s | train: time=2.502s, loss=0.0000, acc=1.0000 | val: time=1.694s, loss=0.4009, acc=0.9438
[100]	total=4.199s | train: time=2.503s, loss=0.0000, acc=1.0000 | val: time=1.696s, loss=0.4174, acc=0.9429
Model ./models/EdLeNet_Grayscale_CLAHE_Norm_5x5.chkpt saved
[EdLeNet_Grayscale_CLAHE_Norm_5x5 - Test Set]	time=0.563s, loss=0.6130, acc=0.9291
In [492]:
metrics_arr = [g_clahe_norm_5x5_tr_metrics, g_clahe_norm_5x5_val_metrics, g_clahe_norm_3x3_tr_metrics, g_clahe_norm_3x3_val_metrics]
lbs = ["5x5 training", "5x5 validation", "3x3 training", "3x3 validation "]
plot_model_results(metrics_arr, [2, 1], lbs, ["Epochs", "Epochs"], ["Accuracy", "Loss"], 
                   ["Accuracy vs Epochs", 
                    "Loss vs Epochs"],
                   "Grayscale Histogram-Equalized (Normalised)  - Accuracy and Loss of models"
                    , fig_size=(17, 5))
Length of axis: (2,)

The 3x3 model performs better, topping 93% test accuracy, but we can see on the graph of Loss vs Epochs that our models losses increase as the epochs go by, which would indicate that we are overfitting and no working hard enough to extract features that generalise. We will try applying aggressive dropout values, with the hope that this helps alleviate the loss and improve the accuracy.

Normalised images With Dropout

We apply our best model with two levels of dropout (p-conv=0.75, p-fc=0.5) on non-normalised histogram-equalised images.

In [522]:
mc_3x3 = ModelConfig(EdLeNet, "EdLeNet_Grayscale_CLAHE_Norm_3x3_Dropout_0.50", [32, 32, 1], [3, 32, 3], [120, 84], n_classes, [0.5, 0.5])
In [523]:
me_g_clahe_norm_drpt_0_50_3x3 = ModelExecutor(mc_3x3)
[EdLeNet] Building neural network [conv layers=3, conv filter size=3, conv start depth=32, fc layers=2]
In [ ]:
(g_clahe_norm_drpt_0_50_3x3_tr_metrics, g_clahe_norm_drpt_0_50_3x3_val_metrics, g_clahe_norm_drpt_0_50_3x3_duration) = me_g_clahe_norm_drpt_0_50_3x3.train_model(X_train_grayscale_equalized_normalised, y_train, X_valid_grayscale_equalized_normalised, y_valid, epochs=500)
(g_clahe_norm_drpt_0_50_3x3_ts_accuracy, g_clahe_norm_drpt_0_50_3x3_ts_loss, g_clahe_norm_drpt_0_50_ts_3x3_duration) =  me_g_clahe_norm_drpt_0_50_3x3.test_model(X_test_grayscale_equalized_normalised, y_test)
Training EdLeNet_Grayscale_CLAHE_Norm_3x3_Dropout_0.50 [epochs=500, batch_size=512]...

[1]	total=5.319s | train: time=3.265s, loss=3.6650, acc=0.0614 | val: time=2.053s, loss=3.6771, acc=0.0580
[10]	total=5.150s | train: time=3.111s, loss=1.3985, acc=0.5385 | val: time=2.039s, loss=1.4661, acc=0.5104
[20]	total=5.152s | train: time=3.111s, loss=0.5728, acc=0.8126 | val: time=2.041s, loss=0.6032, acc=0.8054
[30]	total=5.145s | train: time=3.104s, loss=0.3309, acc=0.8708 | val: time=2.041s, loss=0.3571, acc=0.8556
[40]	total=5.144s | train: time=3.104s, loss=0.2277, acc=0.9048 | val: time=2.040s, loss=0.2621, acc=0.8807
[50]	total=5.144s | train: time=3.105s, loss=0.1474, acc=0.9469 | val: time=2.039s, loss=0.1761, acc=0.9338
[60]	total=5.145s | train: time=3.105s, loss=0.0992, acc=0.9756 | val: time=2.041s, loss=0.1299, acc=0.9599
[70]	total=5.149s | train: time=3.107s, loss=0.0620, acc=0.9879 | val: time=2.042s, loss=0.0919, acc=0.9746
[80]	total=5.150s | train: time=3.109s, loss=0.0455, acc=0.9908 | val: time=2.041s, loss=0.0776, acc=0.9791
[90]	total=5.143s | train: time=3.106s, loss=0.0351, acc=0.9927 | val: time=2.038s, loss=0.0661, acc=0.9828
[100]	total=5.150s | train: time=3.110s, loss=0.0282, acc=0.9942 | val: time=2.040s, loss=0.0591, acc=0.9819
[110]	total=5.144s | train: time=3.102s, loss=0.0249, acc=0.9947 | val: time=2.043s, loss=0.0594, acc=0.9837
[120]	total=5.146s | train: time=3.105s, loss=0.0203, acc=0.9955 | val: time=2.041s, loss=0.0481, acc=0.9875
[130]	total=5.144s | train: time=3.105s, loss=0.0180, acc=0.9959 | val: time=2.039s, loss=0.0523, acc=0.9839
[140]	total=5.144s | train: time=3.106s, loss=0.0167, acc=0.9966 | val: time=2.038s, loss=0.0483, acc=0.9862
[150]	total=5.151s | train: time=3.110s, loss=0.0148, acc=0.9966 | val: time=2.040s, loss=0.0427, acc=0.9868
[160]	total=5.144s | train: time=3.106s, loss=0.0135, acc=0.9969 | val: time=2.038s, loss=0.0439, acc=0.9868
[170]	total=5.142s | train: time=3.105s, loss=0.0133, acc=0.9970 | val: time=2.037s, loss=0.0457, acc=0.9873
[180]	total=5.145s | train: time=3.109s, loss=0.0122, acc=0.9972 | val: time=2.036s, loss=0.0428, acc=0.9864
[190]	total=5.144s | train: time=3.108s, loss=0.0117, acc=0.9972 | val: time=2.036s, loss=0.0422, acc=0.9882
[200]	total=5.148s | train: time=3.107s, loss=0.0100, acc=0.9978 | val: time=2.041s, loss=0.0418, acc=0.9871
[210]	total=5.141s | train: time=3.103s, loss=0.0104, acc=0.9972 | val: time=2.038s, loss=0.0417, acc=0.9868
[220]	total=5.145s | train: time=3.105s, loss=0.0102, acc=0.9976 | val: time=2.040s, loss=0.0407, acc=0.9871
[230]	total=5.141s | train: time=3.103s, loss=0.0091, acc=0.9978 | val: time=2.038s, loss=0.0388, acc=0.9880
[240]	total=5.138s | train: time=3.100s, loss=0.0094, acc=0.9977 | val: time=2.038s, loss=0.0419, acc=0.9873
[250]	total=5.140s | train: time=3.101s, loss=0.0107, acc=0.9969 | val: time=2.038s, loss=0.0479, acc=0.9859
[260]	total=5.146s | train: time=3.107s, loss=0.0089, acc=0.9978 | val: time=2.038s, loss=0.0401, acc=0.9880
[270]	total=5.141s | train: time=3.106s, loss=0.0086, acc=0.9980 | val: time=2.036s, loss=0.0399, acc=0.9884
[280]	total=5.144s | train: time=3.103s, loss=0.0080, acc=0.9982 | val: time=2.041s, loss=0.0399, acc=0.9878
[290]	total=5.146s | train: time=3.103s, loss=0.0071, acc=0.9984 | val: time=2.043s, loss=0.0383, acc=0.9878
[300]	total=5.148s | train: time=3.106s, loss=0.0071, acc=0.9984 | val: time=2.041s, loss=0.0383, acc=0.9884
[310]	total=5.148s | train: time=3.108s, loss=0.0074, acc=0.9982 | val: time=2.040s, loss=0.0389, acc=0.9868
[320]	total=5.141s | train: time=3.103s, loss=0.0070, acc=0.9983 | val: time=2.039s, loss=0.0375, acc=0.9864
[330]	total=5.145s | train: time=3.109s, loss=0.0068, acc=0.9986 | val: time=2.036s, loss=0.0398, acc=0.9868
[340]	total=5.146s | train: time=3.107s, loss=0.0068, acc=0.9984 | val: time=2.040s, loss=0.0383, acc=0.9875
[350]	total=5.151s | train: time=3.109s, loss=0.0074, acc=0.9981 | val: time=2.042s, loss=0.0396, acc=0.9875
[360]	total=5.146s | train: time=3.103s, loss=0.0062, acc=0.9988 | val: time=2.042s, loss=0.0377, acc=0.9878
[370]	total=5.149s | train: time=3.106s, loss=0.0059, acc=0.9987 | val: time=2.042s, loss=0.0369, acc=0.9887
[380]	total=5.137s | train: time=3.099s, loss=0.0060, acc=0.9985 | val: time=2.038s, loss=0.0376, acc=0.9868
[390]	total=5.145s | train: time=3.106s, loss=0.0055, acc=0.9987 | val: time=2.039s, loss=0.0348, acc=0.9896
[400]	total=5.143s | train: time=3.104s, loss=0.0062, acc=0.9984 | val: time=2.039s, loss=0.0367, acc=0.9889
[410]	total=5.148s | train: time=3.108s, loss=0.0057, acc=0.9988 | val: time=2.040s, loss=0.0349, acc=0.9880
[420]	total=5.141s | train: time=3.102s, loss=0.0062, acc=0.9985 | val: time=2.039s, loss=0.0363, acc=0.9891
[430]	total=5.148s | train: time=3.108s, loss=0.0054, acc=0.9989 | val: time=2.040s, loss=0.0346, acc=0.9889
[440]	total=5.148s | train: time=3.108s, loss=0.0053, acc=0.9988 | val: time=2.039s, loss=0.0331, acc=0.9889
[450]	total=5.147s | train: time=3.104s, loss=0.0052, acc=0.9988 | val: time=2.043s, loss=0.0395, acc=0.9864
[490]	total=5.145s | train: time=3.107s, loss=0.0056, acc=0.9986 | val: time=2.039s, loss=0.0400, acc=0.9857
[500]	total=5.148s | train: time=3.109s, loss=0.0048, acc=0.9989 | val: time=2.038s, loss=0.0347, acc=0.9887
Model ./models/EdLeNet_Grayscale_CLAHE_Norm_3x3_Dropout_0.50.chkpt saved
[EdLeNet_Grayscale_CLAHE_Norm_3x3_Dropout_0.50 - Test Set]	time=0.676s, loss=0.0987, acc=0.9758
In [ ]:
mc_3x3 = ModelConfig(EdLeNet, "EdLeNet_Grayscale_CLAHE_Norm_Take-2_3x3_Dropout_0.50", [32, 32, 1], [3, 32, 3], [120, 84], n_classes, [0.6, 0.5])
In [ ]:
me_g_clahe_norm_take2_drpt_0_50_3x3 = ModelExecutor(mc_3x3)
[EdLeNet] Building neural network [conv layers=3, conv filter size=3, conv start depth=32, fc layers=2]
In [ ]:
(g_clahe_norm_take2_drpt_0_50_3x3_tr_metrics, g_clahe_norm_take2_drpt_0_50_3x3_val_metrics, g_clahe_norm_take2_drpt_0_50_3x3_duration)    =  me_g_clahe_norm_take2_drpt_0_50_3x3.train_model(X_train_grayscale_equalized_normalised, y_train, X_valid_grayscale_equalized_normalised, y_valid, epochs=500)
(g_clahe_norm_take2_drpt_0_50_3x3_ts_accuracy, g_clahe_norm_take2_drpt_0_50_3x3_ts_loss,    g_clahe_norm_take2_drpt_0_50_ts_3x3_duration) =  me_g_clahe_norm_take2_drpt_0_50_3x3.test_model(X_test_grayscale_equalized_normalised, y_test)
Training EdLeNet_Grayscale_CLAHE_Norm_Take-2_3x3_Dropout_0.50 [epochs=500, batch_size=512]...

[1]	total=5.194s | train: time=3.137s, loss=3.6254, acc=0.0662 | val: time=2.058s, loss=3.6405, acc=0.0655
[10]	total=5.155s | train: time=3.115s, loss=0.8645, acc=0.7121 | val: time=2.040s, loss=0.9159, acc=0.6819
[20]	total=5.145s | train: time=3.106s, loss=0.2476, acc=0.9010 | val: time=2.040s, loss=0.2838, acc=0.8796
[30]	total=5.147s | train: time=3.106s, loss=0.1205, acc=0.9670 | val: time=2.041s, loss=0.1471, acc=0.9599
[40]	total=5.151s | train: time=3.113s, loss=0.0471, acc=0.9904 | val: time=2.039s, loss=0.0764, acc=0.9782
[50]	total=5.148s | train: time=3.109s, loss=0.0254, acc=0.9945 | val: time=2.039s, loss=0.0601, acc=0.9823
[60]	total=5.150s | train: time=3.110s, loss=0.0157, acc=0.9966 | val: time=2.041s, loss=0.0473, acc=0.9848
[70]	total=5.148s | train: time=3.105s, loss=0.0122, acc=0.9972 | val: time=2.043s, loss=0.0441, acc=0.9857
[80]	total=5.147s | train: time=3.107s, loss=0.0084, acc=0.9978 | val: time=2.039s, loss=0.0419, acc=0.9878
[90]	total=5.145s | train: time=3.106s, loss=0.0069, acc=0.9983 | val: time=2.039s, loss=0.0438, acc=0.9850
[100]	total=5.148s | train: time=3.110s, loss=0.0062, acc=0.9985 | val: time=2.038s, loss=0.0482, acc=0.9846
[110]	total=5.148s | train: time=3.104s, loss=0.0048, acc=0.9988 | val: time=2.044s, loss=0.0415, acc=0.9866
[120]	total=5.149s | train: time=3.111s, loss=0.0045, acc=0.9988 | val: time=2.038s, loss=0.0407, acc=0.9864
[130]	total=5.147s | train: time=3.106s, loss=0.0035, acc=0.9990 | val: time=2.041s, loss=0.0448, acc=0.9866
[140]	total=5.149s | train: time=3.108s, loss=0.0030, acc=0.9994 | val: time=2.041s, loss=0.0385, acc=0.9878
[150]	total=5.145s | train: time=3.107s, loss=0.0024, acc=0.9994 | val: time=2.038s, loss=0.0386, acc=0.9873
[160]	total=5.144s | train: time=3.105s, loss=0.0027, acc=0.9992 | val: time=2.039s, loss=0.0377, acc=0.9880
[170]	total=5.143s | train: time=3.104s, loss=0.0020, acc=0.9995 | val: time=2.039s, loss=0.0365, acc=0.9880
[180]	total=5.145s | train: time=3.105s, loss=0.0021, acc=0.9996 | val: time=2.040s, loss=0.0337, acc=0.9880
[190]	total=5.150s | train: time=3.106s, loss=0.0018, acc=0.9996 | val: time=2.044s, loss=0.0404, acc=0.9875
[200]	total=5.149s | train: time=3.105s, loss=0.0023, acc=0.9993 | val: time=2.044s, loss=0.0306, acc=0.9907
[210]	total=5.147s | train: time=3.109s, loss=0.0020, acc=0.9995 | val: time=2.038s, loss=0.0391, acc=0.9875
[220]	total=5.146s | train: time=3.110s, loss=0.0016, acc=0.9996 | val: time=2.037s, loss=0.0370, acc=0.9893
[230]	total=5.146s | train: time=3.107s, loss=0.0013, acc=0.9998 | val: time=2.039s, loss=0.0361, acc=0.9884
[240]	total=5.142s | train: time=3.105s, loss=0.0014, acc=0.9997 | val: time=2.037s, loss=0.0378, acc=0.9878
[250]	total=5.150s | train: time=3.108s, loss=0.0012, acc=0.9997 | val: time=2.042s, loss=0.0391, acc=0.9887
[260]	total=5.143s | train: time=3.104s, loss=0.0012, acc=0.9997 | val: time=2.039s, loss=0.0337, acc=0.9896
[270]	total=5.142s | train: time=3.103s, loss=0.0012, acc=0.9997 | val: time=2.040s, loss=0.0387, acc=0.9884
[280]	total=5.147s | train: time=3.108s, loss=0.0014, acc=0.9996 | val: time=2.040s, loss=0.0338, acc=0.9891
[290]	total=5.143s | train: time=3.104s, loss=0.0012, acc=0.9996 | val: time=2.039s, loss=0.0378, acc=0.9887
[300]	total=5.148s | train: time=3.109s, loss=0.0012, acc=0.9997 | val: time=2.039s, loss=0.0465, acc=0.9868
[310]	total=5.145s | train: time=3.103s, loss=0.0010, acc=0.9997 | val: time=2.042s, loss=0.0398, acc=0.9878
[320]	total=5.146s | train: time=3.106s, loss=0.0010, acc=0.9999 | val: time=2.040s, loss=0.0406, acc=0.9875
[330]	total=5.146s | train: time=3.110s, loss=0.0014, acc=0.9997 | val: time=2.035s, loss=0.0374, acc=0.9875
[340]	total=5.149s | train: time=3.109s, loss=0.0010, acc=0.9998 | val: time=2.040s, loss=0.0376, acc=0.9891
[350]	total=5.141s | train: time=3.102s, loss=0.0013, acc=0.9997 | val: time=2.040s, loss=0.0344, acc=0.9900
[360]	total=5.141s | train: time=3.104s, loss=0.0010, acc=0.9998 | val: time=2.037s, loss=0.0377, acc=0.9891
[370]	total=5.147s | train: time=3.107s, loss=0.0011, acc=0.9998 | val: time=2.040s, loss=0.0384, acc=0.9887
[380]	total=5.142s | train: time=3.103s, loss=0.0009, acc=0.9998 | val: time=2.040s, loss=0.0345, acc=0.9900
[390]	total=5.148s | train: time=3.104s, loss=0.0008, acc=0.9998 | val: time=2.043s, loss=0.0379, acc=0.9889
[400]	total=5.140s | train: time=3.103s, loss=0.0007, acc=0.9999 | val: time=2.037s, loss=0.0376, acc=0.9887
[410]	total=5.148s | train: time=3.107s, loss=0.0010, acc=0.9999 | val: time=2.041s, loss=0.0403, acc=0.9889
[420]	total=5.154s | train: time=3.108s, loss=0.0009, acc=0.9998 | val: time=2.045s, loss=0.0345, acc=0.9898
[430]	total=5.141s | train: time=3.102s, loss=0.0006, acc=0.9999 | val: time=2.038s, loss=0.0388, acc=0.9882
[440]	total=5.150s | train: time=3.109s, loss=0.0011, acc=0.9997 | val: time=2.040s, loss=0.0468, acc=0.9868
[450]	total=5.144s | train: time=3.105s, loss=0.0007, acc=0.9998 | val: time=2.040s, loss=0.0360, acc=0.9891
[460]	total=5.143s | train: time=3.104s, loss=0.0006, acc=0.9999 | val: time=2.039s, loss=0.0338, acc=0.9896
[470]	total=5.143s | train: time=3.107s, loss=0.0006, acc=0.9999 | val: time=2.036s, loss=0.0396, acc=0.9887
[480]	total=5.149s | train: time=3.106s, loss=0.0009, acc=0.9998 | val: time=2.042s, loss=0.0355, acc=0.9884
[490]	total=5.148s | train: time=3.106s, loss=0.0007, acc=0.9998 | val: time=2.042s, loss=0.0390, acc=0.9884
[500]	total=5.148s | train: time=3.104s, loss=0.0006, acc=0.9999 | val: time=2.044s, loss=0.0420, acc=0.9862
Model ./models/EdLeNet_Grayscale_CLAHE_Norm_Take-2_3x3_Dropout_0.50.chkpt saved
[EdLeNet_Grayscale_CLAHE_Norm_Take-2_3x3_Dropout_0.50 - Test Set]	time=0.675s, loss=0.0890, acc=0.9775

The 3x3 CLAHE - Take 2- achieves a remarkable 97.75% accuracy with loss score under 0.1. That is so far our best model.

Analysis of 3x3 Model performance

In [552]:
test_preds = me_g_clahe_norm_take2_drpt_0_50_3x3.predict(X_test_grayscale_equalized_normalised)
In [553]:
good_test_preds = test_preds == y_test
bad_test_preds = test_preds != y_test
In [554]:
bad_test_pred_imgs = X_test_grayscale_equalized[bad_test_preds]
good_test_pred_imgs = X_test_grayscale_equalized[good_test_preds]
bad_test_pred_correct_lb = y_test[bad_test_preds]
bad_test_pred_incorrect_lb = test_preds[bad_test_preds]
In [555]:
print("Number of good predictions: {0}/{1}".format(len(good_test_pred_imgs), len(y_test)))
print("Number of bad predictions: {0}/{1}".format(len(bad_test_pred_imgs), len(y_test)))
Number of good predictions: 12346/12630
Number of bad predictions: 284/12630

We have 284 images where our prediction was incorrect. Let's visualize some of them to better understand what's going on.

In [556]:
count = 10
idx = np.random.randint(0, high=len(bad_test_pred_imgs), size=10)
show_image_list(bad_test_pred_imgs[idx], bad_test_pred_incorrect_lb[idx], "Bad test predictions", cols=count)

Some of the images are very blurry, despite our histogram equalization, while others seem distored. We turn to data augmentation to address these issues.

Data Augmentation

We observed earlier that the data presented glaring imbalance across the 43 classes. Yet it does not seem to be a crippling problem as we are able to reach very high accuracy despite the class imbalance. We also noticed that some images in the test set are distorted. We are therefore going to use data augmentation techniques in an attempt to:

  1. Minimize imbalances across classes
  2. Extend dataset and provide additional pictures in different lighting settings and orientations
  3. Improve model's ability to become more generic
  4. Improve test and validation accuracy, especially on distorted images
In [528]:
X_train_augs = X_train
y_train_augs = y_train
In [529]:
# This loop augments images per label group
for (lid, lbl), group in X_train_group_by_label:
    #print("[{0}] : {1}".format(lid, lbl))   
    group_count = group['img_id'].size
    idx = group['img_id'].values
    imgs = X_train[idx]
    
    # Take a different population of the subset depending on how many images we have already
    # and vary the number of augmentations depending on size of label group
    pt_spacing = 1.0
    p = 1.0
    n = group_count * 2
    
    if group_count > 500 and group_count < 1000:
        pt_spacing = 3.0
        n = group_count * 0.30
    elif group_count >= 1000 and group_count < 2000:
        pt_spacing = 10.0
        n = group_count * 0.20
    elif group_count >= 2000:
        pt_spacing = 20.0
        n = group_count * 0.10
    
    n = int(n)
    
    space_interval = int(group_count / pt_spacing)
    
    rand_idx = np.linspace(0, group_count, num=space_interval, endpoint=False, dtype='int')
    
    selected_rows = group.iloc[rand_idx]
    selected_img = np.array(list(map(lambda img_id: X_train[img_id], selected_rows['img_id'])))
    
    augs = augment_imgs_until_n(selected_img, n, p)
    X_train_augs = np.concatenate((X_train_augs, augs))
    y_train_augs = np.concatenate((y_train_augs, np.repeat(lid, n)))
    
print("New Augmented arrays shape: {0} and {1}".format(X_train_augs.shape, y_train_augs.shape))
New Augmented arrays shape: (52870, 32, 32, 3) and (52870,)
In [530]:
X_train_augs_id_to_label = group_img_id_to_lbl(y_train_augs, sign_names)
X_train_augs_group_by_label_count = group_img_id_to_lb_count(X_train_augs_id_to_label)

# Now count the number of images for each class and plot the results so that we understand the distribution
X_train_augs_group_by_label_count.plot(kind='bar', figsize=(15, 7))
Out[530]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fd580400be0>

We have managed to mitigate the gaping imbalance we originally observed, but it is still painfully obvious. We decided to temper our augmentation ambitions as we did not want the model to overwhelmingly learn about augmented images, to the detriment of real images. That's a hard balance to strike in all fairness but we believe we have some interesting results. Now every class has at least 10% of its dataset augmented. Our assumption is that our network should now perform better. Let's put it to the test. The below shows visuals of the new dataset, with some images having been augmented by the steps above.

In [533]:
# Let's plot images from each of these classes
X_train_augs_group_by_label = X_train_augs_id_to_label.groupby(["label_id", "label_name"])
In [534]:
show_random_dataset_images(X_train_augs_group_by_label, X_train_augs)

Evaluating model on augmented dataset

For this part, we will focus exclusively on grayscale images as they seem to produce better performance. Color images may exhibit close performance but as we plan to also apply histogram equalization the images must be in grayscale.

Grayscale Augmented Dataset

In [535]:
# First convert to grayscale
X_train_augs_grayscale = np.asarray(list(map(lambda img: to_grayscale(img), X_train_augs))) 
In [536]:
# Then histogram equalize the grayscale images
X_train_augs_grayscale_equalized  = np.asarray(list(map(lambda img: clahe.apply(np.reshape(img, (32, 32))), X_train_augs_grayscale )))
In [537]:
# Now normalise using the mean and std of the training set
X_train_augs_grayscale_equalized_normalised = normalise_images(X_train_augs_grayscale_equalized, X_train_augs_grayscale_equalized)
X_valid_augs_grayscale_equalized_normalised = normalise_images(X_valid_grayscale_equalized, X_train_augs_grayscale_equalized)
X_test_augs_grayscale_equalized_normalised = normalise_images(X_test_grayscale_equalized, X_train_augs_grayscale_equalized)
In [538]:
# Reshaping our inputs to a format that tensorflow accepts (i.e. a 4D tensor)
X_train_augs_grayscale_equalized_normalised = np.reshape(X_train_augs_grayscale_equalized_normalised, (X_train_augs_grayscale_equalized_normalised.shape[0], 32, 32, 1))
X_valid_augs_grayscale_equalized_normalised = np.reshape(X_valid_augs_grayscale_equalized_normalised, (X_valid_augs_grayscale_equalized_normalised.shape[0], 32, 32, 1))
X_test_augs_grayscale_equalized_normalised = np.reshape(X_test_augs_grayscale_equalized_normalised,   (X_test_augs_grayscale_equalized_normalised.shape[0], 32, 32, 1))

No Dropout

In [542]:
mc_3x3 = ModelConfig(EdLeNet, "EdLeNet_Augs_Grayscale_CLAHE_Norm_3x3", [32, 32, 1], [3, 32, 3], [120, 84], n_classes, [1.0, 1.0])
In [543]:
me_g_augs_clahe_norm_3x3 = ModelExecutor(mc_3x3)
[EdLeNet] Building neural network [conv layers=3, conv filter size=3, conv start depth=32, fc layers=2]
In [544]:
(g_augs_clahe_norm_3x3_tr_metrics,  g_augs_clahe_norm_3x3_val_metrics, g_augs_clahe_norm_3x3_duration) = me_g_augs_clahe_norm_3x3.train_model(X_train_augs_grayscale_equalized_normalised, y_train_augs, X_valid_augs_grayscale_equalized_normalised, y_valid, epochs=50)
Training EdLeNet_Augs_Grayscale_CLAHE_Norm_3x3 [epochs=50, batch_size=512]...

[1]	total=7.900s | train: time=4.909s, loss=1.7946, acc=0.5232 | val: time=2.991s, loss=1.1959, acc=0.6551
[10]	total=7.722s | train: time=4.736s, loss=0.5372, acc=0.8462 | val: time=2.986s, loss=0.2670, acc=0.9209
[20]	total=7.708s | train: time=4.729s, loss=0.2284, acc=0.9322 | val: time=2.978s, loss=0.3126, acc=0.9200
[30]	total=7.703s | train: time=4.721s, loss=0.0813, acc=0.9769 | val: time=2.982s, loss=0.4334, acc=0.9263
[40]	total=7.709s | train: time=4.730s, loss=0.0389, acc=0.9889 | val: time=2.979s, loss=0.5252, acc=0.9193
[50]	total=7.694s | train: time=4.720s, loss=0.0060, acc=0.9995 | val: time=2.974s, loss=0.5779, acc=0.9274
Model ./models/EdLeNet_Augs_Grayscale_CLAHE_Norm_3x3.chkpt saved
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-544-5df61640b52f> in <module>()
      1 (g_augs_clahe_norm_3x3_tr_metrics,  g_augs_clahe_norm_3x3_val_metrics, g_augs_clahe_norm_3x3_duration) = me_g_augs_clahe_norm_3x3.train_model(X_train_augs_grayscale_equalized_normalised, y_train_augs, X_valid_augs_grayscale_equalized_normalised, y_valid, epochs=50)
----> 2 (g_augs_clahe_norm_3x3_ts_accuracy, g_augs_clahe_norm_3x3_ts_loss,     g_augs_clahe_norm_3x3_ts_duration) =  me_g_augs_clahe_norm_3x3.test_model(X_test_augdds_grayscale_equalized_normalised, y_test)

NameError: name 'X_test_augdds_grayscale_equalized_normalised' is not defined
In [545]:
(g_augs_clahe_norm_3x3_ts_accuracy, g_augs_clahe_norm_3x3_ts_loss,     g_augs_clahe_norm_3x3_ts_duration) =  me_g_augs_clahe_norm_3x3.test_model(X_test_augs_grayscale_equalized_normalised, y_test)
[EdLeNet_Augs_Grayscale_CLAHE_Norm_3x3 - Test Set]	time=0.792s, loss=0.8904, acc=0.9139

We only ran 50 epochs this time and the results look very encouraging. Let's now try more aggressive values for dropout to see how the model adapts.

Dropout - Keep Fully Connected Layer Weights with p=0.5

In [606]:
mc_3x3 = ModelConfig(EdLeNet, "EdLeNet_Augs_Grayscale_CLAHE_Norm_3x3_Dropout_0.50", [32, 32, 1], [3, 32, 3], [120, 84], n_classes, [0.65, 0.5])
In [607]:
me_g_augs_clahe_norm_drpt_0_50_3x3 = ModelExecutor(mc_3x3)
[EdLeNet] Building neural network [conv layers=3, conv filter size=3, conv start depth=32, fc layers=2]
In [ ]:
(g_augs_clahe_norm_drpt_0_50_3x3_tr_metrics,  g_augs_clahe_norm_drpt_0_50_3x3_val_metrics, g_augs_clahe_norm_drpt_0_50_3x3_duration) = me_g_augs_clahe_norm_drpt_0_50_3x3.train_model(X_train_augs_grayscale_equalized_normalised, y_train_augs, X_valid_augs_grayscale_equalized_normalised, y_valid, epochs=600)
(g_augs_clahe_norm_drpt_0_50_3x3_ts_accuracy, g_augs_clahe_norm_drpt_0_50_3x3_ts_loss,     g_augs_clahe_norm_drpt_0_50_3x3_ts_duration) =  me_g_augs_clahe_norm_drpt_0_50_3x3.test_model(X_test_augs_grayscale_equalized_normalised, y_test)
Training EdLeNet_Augs_Grayscale_CLAHE_Norm_3x3_Dropout_0.50 [epochs=600, batch_size=512]...

[1]	total=7.885s | train: time=4.898s, loss=3.6795, acc=0.0578 | val: time=2.987s, loss=3.6439, acc=0.0769
[10]	total=7.705s | train: time=4.726s, loss=1.3930, acc=0.5884 | val: time=2.978s, loss=0.7264, acc=0.7524
[20]	total=7.696s | train: time=4.720s, loss=0.9150, acc=0.7341 | val: time=2.975s, loss=0.2811, acc=0.9054
[30]	total=7.698s | train: time=4.720s, loss=0.6795, acc=0.8052 | val: time=2.977s, loss=0.1471, acc=0.9551
[40]	total=7.693s | train: time=4.718s, loss=0.5587, acc=0.8440 | val: time=2.975s, loss=0.0980, acc=0.9764
[50]	total=7.695s | train: time=4.720s, loss=0.4765, acc=0.8665 | val: time=2.975s, loss=0.0813, acc=0.9776
[60]	total=7.689s | train: time=4.711s, loss=0.4205, acc=0.8803 | val: time=2.977s, loss=0.0790, acc=0.9769
[70]	total=7.695s | train: time=4.718s, loss=0.3837, acc=0.8922 | val: time=2.978s, loss=0.0646, acc=0.9825
[80]	total=7.689s | train: time=4.713s, loss=0.3550, acc=0.8997 | val: time=2.976s, loss=0.0603, acc=0.9839
[90]	total=7.697s | train: time=4.718s, loss=0.3298, acc=0.9082 | val: time=2.978s, loss=0.0610, acc=0.9832
[100]	total=7.693s | train: time=4.717s, loss=0.3117, acc=0.9117 | val: time=2.976s, loss=0.0540, acc=0.9837
[110]	total=7.685s | train: time=4.712s, loss=0.3064, acc=0.9141 | val: time=2.973s, loss=0.0592, acc=0.9810
[120]	total=7.689s | train: time=4.714s, loss=0.2840, acc=0.9209 | val: time=2.974s, loss=0.0521, acc=0.9848
[130]	total=7.690s | train: time=4.713s, loss=0.2736, acc=0.9238 | val: time=2.977s, loss=0.0475, acc=0.9880
[140]	total=7.688s | train: time=4.712s, loss=0.2654, acc=0.9265 | val: time=2.976s, loss=0.0490, acc=0.9846
[150]	total=7.697s | train: time=4.720s, loss=0.2578, acc=0.9276 | val: time=2.977s, loss=0.0482, acc=0.9862
[160]	total=7.689s | train: time=4.718s, loss=0.2672, acc=0.9254 | val: time=2.972s, loss=0.0554, acc=0.9819
[170]	total=7.688s | train: time=4.714s, loss=0.2471, acc=0.9305 | val: time=2.975s, loss=0.0480, acc=0.9848
[180]	total=7.697s | train: time=4.715s, loss=0.2426, acc=0.9324 | val: time=2.982s, loss=0.0474, acc=0.9855
[190]	total=7.679s | train: time=4.706s, loss=0.2361, acc=0.9346 | val: time=2.973s, loss=0.0466, acc=0.9857
[200]	total=7.687s | train: time=4.711s, loss=0.2306, acc=0.9354 | val: time=2.976s, loss=0.0474, acc=0.9853
[210]	total=7.690s | train: time=4.709s, loss=0.2338, acc=0.9346 | val: time=2.981s, loss=0.0542, acc=0.9821
[220]	total=7.691s | train: time=4.715s, loss=0.2233, acc=0.9371 | val: time=2.976s, loss=0.0487, acc=0.9850
[230]	total=7.686s | train: time=4.712s, loss=0.2231, acc=0.9375 | val: time=2.974s, loss=0.0451, acc=0.9841

We were running out of time here so had to interrupt the computations. Data augmentation shows a lot of promise and we will train our model on it in the future.

Future Improvements

While we have obtained very good results, we believe that we can achieve even better performance in the future by testing more hyperparameter variations (e.g. dimensions of fully connected layers) as well as exploring novel ANN architectures such as GoogLeNet's Inception Module, Resnet, or Xception. We also wished we had a bit more time to train the model on the augmented dataset.


Test Model on New Images

In this section, we use new images of German traffic signs, downloaded from the Internet, to run the model's prediction against. We then visualise the top 5 softmax probabilities and finally, show what the model's activation layers look like for the supplied images.

Load and Output the Images

In [557]:
from PIL import Image
import os

def get_imgs_from_folder(path, size=(32, 32), grayscale=False):
    """  
    Returns a list of images from a folder as a numpy array
    """
    img_list = [os.path.join(path,f) for f in os.listdir(path) if f.endswith(".jpg") or f.endswith(".png")]
    imgs = None 
    if grayscale:
        imgs = np.empty([len(img_list), size[0], size[1]], dtype=np.uint8) 
    else:
        imgs = np.empty([len(img_list), size[0], size[1], 3], dtype=np.uint8) 

    for i, img_path in enumerate(img_list):
        img = Image.open(img_path).convert('RGB')
        img = img.resize(size)
        im = np.array(to_grayscale(img)) if grayscale else np.array(img)
        imgs[i] = im

    return imgs
In [558]:
new_imgs_dir = "./custom_images/"
In [559]:
new_imgs = get_imgs_from_folder(new_imgs_dir)
new_imgs.shape
Out[559]:
(5, 32, 32, 3)
In [560]:
new_img_grayscale_clahe = np.asarray(list(map(lambda img: clahe.apply(to_grayscale(img)), new_imgs)))
In [561]:
def class_ids_to_labels(cids):
    return list(map(lambda cid: sign_names[sign_names["ClassId"] == cid] ["SignName"].values[0],  cids))
In [562]:
# Let's also create the labels for our images
new_img_lbs = np.array([8, 12, 15, 25, 16])
In [563]:
new_img_lbs_text = class_ids_to_labels(new_img_lbs)
new_img_lbs_text
Out[563]:
['Speed limit (120km/h)',
 'Priority road',
 'No vehicles',
 'Road work',
 'Vehicles over 3.5 metric tons prohibited']
In [564]:
show_image_list(new_imgs, new_img_lbs_text, "New Traffic Sign Images (Color)", cols=len(new_imgs), 
                fig_size=(6,6), show_ticks=False)
In [565]:
show_image_list(new_img_grayscale_clahe, new_img_lbs_text, "New Traffic Sign Images (Grayscale)", cols=len(new_img_grayscale_clahe), 
                fig_size=(6,6), show_ticks=False)

The Images were chosen because of the following:

  • They represent different traffic signs that we currently classify
  • They vary in shape and color
  • They are under different lighting conditions (the 4th one has sunlight reflection)
  • They are under different orientations (the 3rd one is slanted)
  • They have different background
  • The last image is actually a design, not a real picture, and we wanted to test the model against it to see if it recognizes it
  • Some of them are in under-represented classes

Predict the Sign Type for Each Image

In [566]:
# First normalise the images
new_img_grayscale_clahe_normalised = normalise_images(new_img_grayscale_clahe, X_train_grayscale_equalized)
In [567]:
# Convert to a format Tensorflow understands
new_img_grayscale_clahe_normalised = np.reshape(new_img_grayscale_clahe_normalised, (new_img_grayscale_clahe_normalised.shape[0], 32, 32, 1))
In [594]:
# Let's run the predictions now
preds = me_g_clahe_norm_take2_drpt_0_50_3x3.predict(new_img_grayscale_clahe_normalised)
In [595]:
preds
Out[595]:
array([ 8, 12, 15, 25, 16])

Analyze Prediction Performance

Calculating accuracy

In [596]:
new_img_grayscale_norm_pred_acc = np.sum(new_img_lbs == preds) / len(preds)
print("[Grayscale Normalised] Predictional accuracy on new images: {0}%".format(new_img_grayscale_norm_pred_acc * 100))
[Grayscale Normalised] Predictional accuracy on new images: 100.0%

We achieve perfect accuracy of 100% on the new images. On the original test set, we achieved 97.75% accuracy. We could explore blurring our new images or modifying contrast to see how the model handles those changes in the future.

Next we will explore the distribution of our top 5 Softmax probabilities for each image and will plot them.

Calculating Top 5 Softmax Probabilities

In [597]:
new_img_grayscale_clahe_norm_top5_probs = me_g_clahe_norm_take2_drpt_0_50_3x3.predict(new_img_grayscale_clahe_normalised, top_5=True)
In [598]:
new_img_grayscale_clahe_norm_softmax_lbs_names = list(map(lambda cids: class_ids_to_labels(cids), new_img_grayscale_clahe_norm_top5_probs.indices))
In [599]:
new_img_grayscale_clahe_norm_softmax_and_labels = (list(zip(new_img_grayscale_clahe_norm_top5_probs.values, new_img_grayscale_clahe_norm_softmax_lbs_names)))
In [600]:
# Let's see the Softmax probabilities for the first image to have an idea what the format looks like
new_img_grayscale_clahe_norm_top5_probs.values[0]
Out[600]:
array([  9.99998689e-01,   1.20421475e-06,   5.40128191e-08,
         1.49968162e-08,   2.24609775e-09], dtype=float32)
In [601]:
# Let's do the same for the predicted labels
new_img_grayscale_clahe_norm_top5_probs.indices[0]
Out[601]:
array([8, 7, 4, 0, 5], dtype=int32)
In [602]:
# And now let's see what our zipped list of softmax probabilities with label names looks like
new_img_grayscale_clahe_norm_softmax_and_labels[0]
Out[602]:
(array([  9.99998689e-01,   1.20421475e-06,   5.40128191e-08,
          1.49968162e-08,   2.24609775e-09], dtype=float32),
 ['Speed limit (120km/h)',
  'Speed limit (100km/h)',
  'Speed limit (70km/h)',
  'Speed limit (20km/h)',
  'Speed limit (80km/h)'])

Visualising Our SoftMax Probabilities

We show each new images next to its computed top 5 softmax probabilities. The bars in green indicates the ground truth label.

In [603]:
def show_imgs_softmax_probs(imgs, softmax_probs_lb_names, correct_lbs, fig_size=(15, 7)):
    rows = len(imgs)
    fg, ax = plt.subplots(nrows=rows, ncols=2, figsize=fig_size)
    for i, prob_lb in enumerate(softmax_probs_lb_names):
        correct_lb = correct_lbs[i]
        img = imgs[i]
        ax[i,0].imshow(img)
        ax[i,0].axis("off")

        lb_names = prob_lb[1]
        y_pos = np.arange(len(lb_names))
        vals = prob_lb[0]
        for j in range(0, len(lb_names)):
            c = "g" if lb_names[j] == correct_lb else "r"
            ax[i, 1].barh(j, vals[j], color=c, label="{0}".format(lb_names[j]))

        ax[i, 1].set_yticks(y_pos)
        ax[i, 1].set_yticklabels(lb_names)
        ax[i, 1].invert_yaxis()  # labels read top-to-bottom
        ax[i, 1].set_xlabel('Class')
        ax[i, 1].set_title('Softmax')  
        ax[i, 1].set_xscale('log')

        ax[i, 1].legend(loc='lower right')
    
    fg.tight_layout()
    plt.show()
In [605]:
show_imgs_softmax_probs(new_imgs, new_img_grayscale_clahe_norm_softmax_and_labels, new_img_lbs_text, fig_size=(15, 20))

We can clearly see that our model quite confident in its predictions. In the worst case (last image), the 2 most likely prediction has a probability of around 10-3. In fact our model struggles most on the last image, which I believe is actually a design a not even a real picture. Overall, we have developed a strong model!


Visualizing feature maps

This Section is not required to complete but acts as an additional excersise for understaning the output of a neural network's weights. While neural networks can be a great learning device they are often referred to as a black box. We can understand what the weights of a neural network look like better by plotting their feature maps. After successfully training your neural network you can see what it's feature maps look like by plotting the output of the network's weight layers in response to a test stimuli image. From these plotted feature maps, it's possible to see what characteristics of an image the network finds interesting. For a sign, maybe the inner network feature maps react with high activation to the sign's boundary outline or to the contrast in the sign's painted symbol.

Provided for you below is the function code that allows you to get the visualization output of any tensorflow weight layer you want. The inputs to the function should be a stimuli image, one used during training or a new one you provided, and then the tensorflow variable name that represents the layer's state during the training process, for instance if you wanted to see what the LeNet lab's feature maps looked like for it's second convolutional layer you could enter conv2 as the tf_activation variable.

For an example of what feature map outputs look like, check out NVIDIA's results in their paper End-to-End Deep Learning for Self-Driving Cars in the section Visualization of internal CNN State. NVIDIA was able to show that their network's inner weights had high activations to road boundary lines by comparing feature maps from an image with a clear path to one without. Try experimenting with a similar test to show that your trained network's weights are looking for interesting features, whether it's looking at differences in feature maps from images with or without a sign, or even what feature maps look like in a trained network vs a completely untrained one on the same sign image.

Combined Image

Your output should look something like this (above)

In [590]:
selected_new_img = np.reshape(new_img_grayscale_clahe[0], (32, 32, 1))
In [591]:
me_g_clahe_norm_take2_drpt_0_50_3x3.show_conv_feature_maps(selected_new_img, conv_layer_idx=0, fig_size=(20, 10), title_y_pos=1)
Fetching tensor: EdLeNet_Grayscale_CLAHE_Norm_Take-2_3x3_Dropout_0.50/conv_0_relu:0
Shape of activation layer: (1, 30, 30, 32)

We can see that the network is focusing a lot on the edges of the circle and somehow on the 120. The background is ignored.

In [592]:
me_g_clahe_norm_take2_drpt_0_50_3x3.show_conv_feature_maps(selected_new_img, conv_layer_idx=1, fig_size=(20, 15), title_y_pos=1)
Fetching tensor: EdLeNet_Grayscale_CLAHE_Norm_Take-2_3x3_Dropout_0.50/conv_1_relu:0
Shape of activation layer: (1, 13, 13, 64)

It is rather hard to determine what the network is focusing on in layer 1, but it seems to "activate" around the edges of the circle and in the middle, where our 120 is printed.

In [593]:
me_g_clahe_norm_take2_drpt_0_50_3x3.show_conv_feature_maps(selected_new_img, conv_layer_idx=2, fig_size=(20, 20), title_y_pos=1)
Fetching tensor: EdLeNet_Grayscale_CLAHE_Norm_Take-2_3x3_Dropout_0.50/conv_2_relu:0
Shape of activation layer: (1, 4, 4, 128)

This activation map is also hard to decipher... But it seems the network reacts to stimuli on the edges and in the middle once again.

In [ ]: